More on matrix determinants

The determinant of a general \(n\times n\) matrix is quite complicated. We give a mechanical, recursive definition first and then think about its meaning later.

Definition (Determinant)

Let \(A = (a_{ij}) \in M_n\). The determinant \(|A|\) of \(A\) is defined recursively in the following manner.

If \(n = 1\), then \(|A| = a_{11}\).
If \(n > 1\), then let \(A_{ij}\) denote the \((n-1)\times(n-1)\) matrix obtained by removing the \(i\)-th row and the \(j\)-th column from \(A\), and \[|A| = \sum_{j=1}^{n}(-1)^{i+j}a_{ij}|A_{ij}|\] where \(i\) is any arbitrary index from 1 to \(n\). (Instead of a row, you may use an arbitrary column to obtain the same result.)

According to this definition, if we want to compute \(|A|\) of an \(n\times n\) matrix, we need to compute the determinants \(|A_{ij}|\) of (many) \((n-1)\times(n-1)\) matrices, which requires computing determinants of \((n-2)\times(n-2)\) matrices, and so on, until we reach the determinants of \(1\times 1\) matrices which are trivial to compute (actually we already know how to compute the determinant of a \(2\times 2\) matrix so we may stop at \(n=2\)).

Example. First, let's see the previous definition of the determinant of a \(2\times 2\) matrix is consistent with the above definition.

Let \(A = \begin{pmatrix}a & b\\c & d\end{pmatrix}\). Then,

\[|A| = (-1)^{1+1}a|(d)| + (-1)^{1+2}b|(c)| = ad - bc.\] Thus, the two definitions are indeed consistent. □

Example.

Let

\[ A = \begin{pmatrix} 1 & 0 & 2\\ -1 & 1 & 1\\ 0 & 1 & 0 \end{pmatrix} = (a_{ij}). \]

To compute its determinants, let's pick the first row as ``\(i\)''. So

\[ \begin{eqnarray*} |A| &=& (-1)^{1+1}a_{1,1} \begin{vmatrix} a_{2,2} & a_{2,3}\\ a_{3,2} & a_{3,3} \end{vmatrix} + (-1)^{1+2}a_{1,2} \begin{vmatrix} a_{2,1} & a_{2,3}\\ a_{3,1} & a_{3,3} \end{vmatrix} + (-1)^{1+3}a_{1,3} \begin{vmatrix} a_{2,1} & a_{2,2}\\ a_{3,1} & a_{3,2} \end{vmatrix}\\ &=& (-1)^{1+1}\cdot 1\cdot \begin{vmatrix} 1 & 1\\ 1 & 0 \end{vmatrix} + (-1)^{1+2}\cdot 0\cdot \begin{vmatrix} -1 & 1\\ 0 & 0 \end{vmatrix} + (-1)^{1+3}\cdot 2\cdot \begin{vmatrix} -1 & 1\\ 0 & 1 \end{vmatrix}\\ &=& -1 + 0 + (-2) = -3. \end{eqnarray*} \]

Since we may pick any arbitrary row (\(i\)), we may equally use the second row to compute

\[ \begin{eqnarray*} |A| &=& (-1)^{2+1}(-1) \begin{vmatrix} 0 & 2\\ 1 & 0 \end{vmatrix} + (-1)^{2+2}(1) \begin{vmatrix} 1 & 2\\ 0 & 0 \end{vmatrix} + (-1)^{2+3}(1) \begin{vmatrix} 1 & 0\\ 0 & 1 \end{vmatrix}\\ &=& -2 + 0 - 1 = -3 \end{eqnarray*} \]

to obtain the same result. □

Example.

Let \(X = (x_{ij}) \in M_3\). Prove the following.

\[ |X| = x_{11}x_{22}x_{33} - x_{11}x_{23}x_{32} + x_{12}x_{23}x_{31} - x_{12}x_{21}x_{33} +x_{13}x_{21}x_{32} - x_{13}x_{22}x_{31}. \]

Do you see any pattern? There are \(3!=6\) terms. Each term is a product of three elements. Each of 1, 2, 3 appears only once in the first index of each factor; the same for the second index. Half (3) of the terms are multiplied by -1. What are these patterns? They are permutations! As fascinating as it is, we don't delve into the details here. See a textbook on linear algebra. □

Before studying the properties of determinants, we introduce the important notion of linearity.

Definition (Linear function)

Let \(V\) and \(W\) be vector spaces, both over the field \(K\). The function \(f: V \to W\) is said to be linear if the following hold.

For all \(\mathbf{u}, \mathbf{v} \in V\), \(f(\mathbf{u} + \mathbf{v}) = f(\mathbf{u}) + f(\mathbf{v})\),
For all \(\mathbf{u}\in V\) and \(\lambda \in K\), \(f(\lambda\mathbf{u}) = \lambda f(\mathbf{u})\).

Remark. These two conditions can be summarized as

\[f(\lambda\mathbf{u} + \mu\mathbf{v}) = \lambda f(\mathbf{u}) + \mu f(\mathbf{v})\] for all \(\mathbf{u}, \mathbf{v}\in V\) and \(\lambda, \mu\in K\). □

Matrix determinants are linear functions of row (or column) vectors (in fact, they are called multilinear).

Lemma (Determinants are linear)

Let \[A = \begin{pmatrix} \mathbf{a}_1\\ \mathbf{a}_2\\ \vdots\\ \mathbf{a}_n \end{pmatrix}\]

be an \(n\times n\) matrix such that each \(\mathbf{a}_i, i = 1, \cdots, n\) is an \(n\)-dimensional row vector.

If \(\mathbf{a}_i = \mathbf{b}_i + \mathbf{c}_i\) for a particular \(i\), we have \[\begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{b}_i + \mathbf{c}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix} = \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{b}_i \\ \vdots\\ \mathbf{a}_n \end{vmatrix} + \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{c}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix}. \]
If \(\lambda \in \mathbb{R}\), then \[\begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \lambda\mathbf{a}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix} = \lambda \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{a}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix}.\]

Proof.

By the definition of determinant, noting that \(\mathbf{b}_i + \mathbf{c}_i = (b_{i1} + c_{i1}, \cdots, b_{ij} + c_{ij}, \cdots, b_{in} + c_{in})\), \[\begin{eqnarray*} \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{b}_i + \mathbf{c}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix} &=& \sum_{j=1}^{n}(-1)^{i+j}(b_{ij} + c_{ij})|A_{ij}|\\ &=& \sum_{j=1}^{n}(-1)^{i+j}b_{ij}|A_{ij}| + \sum_{j=1}^{n}(-1)^{i+j}c_{ij}|A_{ij}|\\ &=& \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{b}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix} + \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{c}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix}. \end{eqnarray*} \]
(exercise).

■

Lemma

If two rows are swapped, the sign of a determinant changes. \[ \begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{a}_i\\ \vdots\\ \mathbf{a}_{i'}\\ \vdots\\ \mathbf{a}_n \end{vmatrix} = -\begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{a}_{i'}\\ \vdots\\ \mathbf{a}_i\\ \vdots\\ \mathbf{a}_n \end{vmatrix}.\]
If two rows are the same, the determinant is 0. If \(\mathbf{a}_i = \mathbf{a}_j = \mathbf{b}\), \[\begin{vmatrix} \mathbf{a}_1\\ \vdots\\ \mathbf{b}\\ \vdots\\ \mathbf{b}\\ \vdots\\ \mathbf{a}_n \end{vmatrix} = 0.\]

Proof.

1. We prove this by mathematical induction. If \(n = 1\), there is nothing to prove (no rows to be swapped). If \(n = 2\), \[\begin{vmatrix} a & b\\ c & d \end{vmatrix} = ad - bc\] and \[\begin{vmatrix} c & d\\ a & b \end{vmatrix} = bc - ad,\] so \[\begin{vmatrix} a & b\\ c & d \end{vmatrix} = -\begin{vmatrix} c & d\\ a & b \end{vmatrix}.\]

Suppose the claim holds for \(n = m\). Now consider a \((m+1)\times(m+1)\) matrix \(A\) and let \(B\) be the matrix that is the same as \(A\) except that its \(k\)-th and \(l\)-th rows are swapped. \[|A| = \sum_{j=1}^{m+1}(-1)^{i+j}a_{ij}|A_{ij}|\]

where each \(A_{ij}\) is a \(m\times m\) matrix obtained by removing the \(i\)-th row and \(j\)-th column of \(A\). We may assume \(i\neq k, l\). If we swap the \(k\)-th and \(l\)-th rows of \(A\), then the corresponding rows in each of \(A_{ij}\) are swapped, which we denote by \(B_{ij}\). By the inductive hypothesis, we have

\[|A_{ij}| = -|B_{ij}|\]

so that

\[\begin{eqnarray} |B| &=& \sum_{j=1}^{m+1}(-1)^{i+j}a_{ij}|B_{ij}|\\ &=& -\sum_{j=1}^{m+1}(-1)^{i+j}a_{ij}|A_{ij}|\\ &=& -|A| \end{eqnarray}\]

which completes the proof.

2. Suppose that two rows are identical, say, \(\mathbf{a}_k = \mathbf{a}_l\). Then swapping these rows does not change the determinant. On the other hand, by Part 1, they should have opposite signs. Thus, this determinant must be 0.

■

Let us summarize the properties of determinants.

\(\det\) is a function \(\mathbb{R}^{n^2} \to \mathbb{R}\).
\(\det\) can be also considered as a function \[\mathbb{R}^n\times\mathbb{R}^n\times\cdots \times \mathbb{R}^n \to \mathbb{R}\] where each \(\mathbb{R}^n\) corresponds to a row (or column) vector of the matrix.
\(\det\) is a multilinear function. This means that it is linear in each row (or column) as we saw in the above lemma.
Swapping the values of two arguments (i.e., rows) changes the sign of the value of the function.
If two arguments (i.e., rows) take the same value, then the value of the function is 0.

Now let us have a look at the determinant of the \(3\times 3\) matrix \(X\) again, \[X = \begin{pmatrix} x_{11} & x_{12} & x_{13}\\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{pmatrix}.\]

If we define the following unit row vectors \[\begin{eqnarray} \mathbf{e}_1 &=& (1, 0, 0),\\ \mathbf{e}_2 &=& (0, 1, 0),\\ \mathbf{e}_3 &=& (0, 0, 1), \end{eqnarray}\]

then we can write, for example, the first row as

\[\mathbf{x}_1 = (x_{11}, x_{12}, x_{13}) = x_{11}\mathbf{e}_1 + x_{12}\mathbf{e}_2 + x_{13}\mathbf{e}_3,\]

and so on. Therefore, by applying the above lemmas repeatedly, we have

\[\begin{eqnarray*} |X| &=& \begin{vmatrix} x_{11}\mathbf{e}_1 + x_{12}\mathbf{e}_2 + x_{13}\mathbf{e}_3\\ x_{21}\mathbf{e}_1 + x_{22}\mathbf{e}_2 + x_{23}\mathbf{e}_3\\ x_{31}\mathbf{e}_1 + x_{32}\mathbf{e}_2 + x_{33}\mathbf{e}_3 \end{vmatrix}\\ &=& x_{11}x_{22}x_{33} \begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_2\\ \mathbf{e}_3 \end{vmatrix} + x_{11}x_{23}x_{32} \begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_3\\ \mathbf{e}_2 \end{vmatrix} + x_{12}x_{23}x_{31} \begin{vmatrix} \mathbf{e}_2\\ \mathbf{e}_3\\ \mathbf{e}_1 \end{vmatrix} + x_{12}x_{21}x_{33} \begin{vmatrix} \mathbf{e}_2\\ \mathbf{e}_1\\ \mathbf{e}_3 \end{vmatrix}\\ && + x_{13}x_{21}x_{32} \begin{vmatrix} \mathbf{e}_3\\ \mathbf{e}_1\\ \mathbf{e}_2 \end{vmatrix} + x_{13}x_{22}x_{31} \begin{vmatrix} \mathbf{e}_3\\ \mathbf{e}_2\\ \mathbf{e}_1 \end{vmatrix}. \end{eqnarray*}\]

Note the order of the unit vectors in each term: They cover all the \(3!=6\) permutations. We can swap the rows in each term so that it becomes the identity matrix (with an appropriate change of signs), and note that

\[\begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_2\\ \mathbf{e}_3 \end{vmatrix} = \begin{vmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{vmatrix} = 1,\]

and we obtain the familiar result:

\[|X| = x_{11}x_{22}x_{33} - x_{11}x_{23}x_{32} + x_{12}x_{23}x_{31} - x_{12}x_{21}x_{33} +x_{13}x_{21}x_{32} - x_{13}x_{22}x_{31}.\]

Theorem (The determinant of a product is the product of determinants)

Let \(X, Y \in M_n\). Then

\[\det(XY) = \det(X)\det(Y).\]

Proof. Let \(X = (x_{ij})\) and let \(\mathbf{y}_i = (y_{i1}, \cdots, y_{in})\) denote the \(i\)-th row of \(Y\). Also, let \(\mathbf{e}_i\) denote the \(i\)-th row of the identity matrix \(I_n\). Then

\[(XY)_{ij} = \sum_{k=1}^{n}x_{ik}y_{kj}= \sum_{k=1}^{n}x_{ik}(\mathbf{y}_k)_j\]

so we can write

\[ |XY| = \begin{vmatrix} x_{11}\mathbf{y}_1 + x_{12}\mathbf{y}_2 + \cdots + x_{1n}\mathbf{y}_n\\ x_{21}\mathbf{y}_1 + x_{22}\mathbf{y}_2 + \cdots + x_{2n}\mathbf{y}_n\\ \vdots\\ x_{n1}\mathbf{y}_1 + x_{n2}\mathbf{y}_2 + \cdots + x_{nn}\mathbf{y}_n \end{vmatrix}.\tag{eq:xy1} \] By using multilinearity of the determinant, this reduces to

\[ = \sum_{\sigma} x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} \begin{vmatrix} \mathbf{y}_{\sigma(1)}\\ \mathbf{y}_{\sigma(2)}\\ \vdots\\ \mathbf{y}_{\sigma(n)} \end{vmatrix}\tag{eq:xy2} \]

where the summation \(\sum_{\sigma}\) is over all permutations \((1,2,\cdots,n) \to (\sigma(1),\sigma(2),\cdots, \sigma(n))\). Rearranging the order of \(\mathbf{y}_{\sigma(1)}, \mathbf{y}_{\sigma(2)},\cdots, \mathbf{y}_{\sigma(n)}\) to

\(\mathbf{y}_{1}, \mathbf{y}_{2},\cdots, \mathbf{y}_{n}\) in each term results in changes of signs (denoted \(\text{sign}(\sigma)\)). So we can write

\[|XY| = \sum_{\sigma}\text{sign}(\sigma)x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} \begin{vmatrix} \mathbf{y}_{1}\\ \mathbf{y}_{2}\\ \vdots\\ \mathbf{y}_{n} \end{vmatrix}\tag{eq:xy3}\]

Now, noting that

\[|I_n| = \begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_2\\ \vdots\\ \mathbf{e}_n \end{vmatrix} = 1,\]

we have

\[\begin{eqnarray*} \sum_{\sigma}\text{sign}(\sigma)x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} &=& \sum_{\sigma}\text{sign}(\sigma)x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} \begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_2\\ \vdots\\ \mathbf{e}_n \end{vmatrix}\\ &=& \sum_{\sigma}x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} \begin{vmatrix} \mathbf{e}_{\sigma(1)}\\ \mathbf{e}_{\sigma(2)}\\ \vdots\\ \mathbf{e}_{\sigma(n)} \end{vmatrix} \end{eqnarray*}\]

where we reordered the last factor to eliminate \(\text{sign}(\sigma)\). Following the argument from ({eq:xy1) to (eq:xy2) in backwards, with \(\mathbf{y}_i\)'s replaced with \(\mathbf{e}_j\)'s, we have

\[\sum_{\sigma}x_{1\sigma(1)}x_{2\sigma(2)}\cdots x_{n\sigma(n)} \begin{vmatrix} \mathbf{e}_{\sigma(1)}\\ \mathbf{e}_{\sigma(2)}\\ \vdots\\ \mathbf{e}_{\sigma(n)} \end{vmatrix} = \begin{vmatrix} x_{11}\mathbf{e}_1 + x_{12}\mathbf{e}_2 + \cdots + x_{1n}\mathbf{e}_n\\ x_{21}\mathbf{e}_1 + x_{22}\mathbf{e}_2 + \cdots + x_{2n}\mathbf{e}_n\\ \vdots\\ x_{n1}\mathbf{e}_1 + x_{n2}\mathbf{e}_2 + \cdots + x_{nn}\mathbf{e}_n \end{vmatrix} = |X|.\]

Combining with (eq:xy3), we have

\[|XY| = |X|\cdot|Y|.\]

■

Remark. In the above proof, the equalities such as

\[\begin{eqnarray} \begin{vmatrix} \mathbf{y}_{\sigma(1)}\\ \mathbf{y}_{\sigma(2)}\\ \vdots\\ \mathbf{y}_{\sigma(n)} \end{vmatrix} &=& \text{sign}(\sigma)\begin{vmatrix} \mathbf{y}_1\\ \mathbf{y}_2\\ \vdots\\ \mathbf{y}_n \end{vmatrix}\\ \text{sign}(\sigma)\begin{vmatrix} \mathbf{e}_1\\ \mathbf{e}_2\\ \vdots\\ \mathbf{e}_n \end{vmatrix} &=& \begin{vmatrix} \mathbf{e}_{\sigma(1)}\\ \mathbf{e}_{\sigma(2)}\\ \vdots\\ \mathbf{e}_{\sigma(n)} \end{vmatrix} \end{eqnarray}\]

assume the following facts:

The permutation from \((1, 2, \cdots, n)\) to \((\sigma(1), \sigma(2), \cdots, \sigma(n))\) can be achieved by swapping two entries finitely many times.
The number of such swap operations is not unique (e.g., you can swap the same two elements 0, 2, 4, or any even number of times to have the same permutation), but its parity (whether the number of swaps is odd or even) is unique. If the parity is odd, \(\text{sign}(\sigma) = -1\), if the parity is even, \(\text{sign}(\sigma) = 1\).

□

See also: Permutation group (Wikipedia)

Theorem (Determinant of transposed matrix)

For any square matrix \(A\), we have

\[|A| = |A^T|.\]

Proof. Exercise. ■

Definition (Adjugate matrix)

Let \(A = (a_{ij}) \in M_n\). We define the adjugate matrix \(A^* = (a_{ij}^*)\) by

\[a_{ij}^* = (-1)^{i+j}|A_{ji}|\]

where \(A_{ji}\) is the matrix obtained by removing the \(j\)-th row and the \(i\)-th column from \(A\).

Theorem

Let \(A = (a_{ij}) \in M_n\) and \(A^*\) its adjugate.

\(AA^* = A^*A = \det(A)\cdot I_n\).
\(A\) has a right inverse if and only if \(\det(A) \neq 0\).
If \(A\) has a right inverse \(X\), then \(X\) is also a left inverse of \(A\), and is also the unique inverse on either side.

Proof.

Let \(Z = (z_{ij}) = AA^*\). Then \[z_{ij} = \sum_{k=1}^{n}a_{ik}a_{kj}^* = \sum_{k=1}^{n}a_{ik}(-1)^{j+k}|A_{jk}|\] If \(i = j\), then this is a definition of \(|A|\). If \(i\neq j\), this can be regarded as a determinant of some matrix. That matrix is almost the same as \(A\) except that the \(j\)-th row is replaced with the \(i\)-th row of \(A\) so that it has two rows with the same value. Hence this determinant is 0. Thus, \[z_{ij} = |A|\cdot \delta_{ij}.\] That is, \(Z = |A|\cdot I_n.\) The case for \(A^*A\) is similar.
Let the right inverse of \(A\) be \(X\). Then we have \(AX = I_n\). Thus \(|AX| = |I_n| = 1\). But \(|AX| = |A|\cdot|X|\). Therefore \(|A|\neq 0\). Conversely, if \(|A| \neq 0\), then \((1/|A|)A^{*}\) is a right inverse from Part 1.
Suppose \(AY = I_n\). Multiplying both sides by \((1/|A|)A^*\) from the left yields \(Y = (1/|A|)A^*\). Similar for \(YA = I_n\).

■

The unique inverse of \(A\) is denoted \(A^{-1}\).

Theorem

Let \(A, B \in M_n\). If their inverses exist, then

\[(AB)^{-1} = B^{-1}A^{-1}.\]

Proof.

\[(AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1}= AI_nA^{-1} = AA^{-1} = I_n.\]

Similarly, \((B^{-1}A^{-1})(AB) = I_n\). ■

Search This Blog

Brunei Math Club