Appendix - Matrices

A few elementary facts about matrices with real entries are recalled here. The purpose of this appendix is to provide a refresher of key concepts in matrix algebra that are needed for the main development of linear algebra in the notes. Many proofs are omitted, and the stress is on acquiring a working knowledge of matrix algebra.

Basic definitions

A matrix or order $m \times n$ (read " $m$ cross $n$ ") is a rectangular array of real numbers, as shown below: $\begin{bmatrix} A_{11} & A_{12} & \ldots & A_{1n}\\ A_{21} & A_{22} & \ldots & A_{2n}\\ \vdots & & \ddots & \vdots\\ A_{m1} & A_{m2} & \ldots & A_{mn} \end{bmatrix}.$ Here $\{A_{ij}\}$ are real numbers for every $1 \le i \le m$ and $1 \le j \le m$ . A horizontal section of the matrix is called a row, while a vertical section is called a column. Every $m \times n$ matrix thus has $m$ rows and $n$ columns. The element in the $i^{\text{th}}$ row and $j^{\text{th}}$ column of this matrix is $A_{ij}$ . It is conventional to call the index $i$ in $A_{ij}$ as the row index and the index $j$ as the column index. We will typically use a symbol like $\mathsf{A}$ to represent a matrix like the one shown above. The fact that the $(i,j)^{\text{th}}$ entry of $\mathsf{A}$ is $A_{ij}$ is written as follows: $\mathsf{A}_{ij} = A_{ij}$ . Since the matrix $\mathsf{A}$ has $mn$ real numbers $\{A_{ij}\}$ arranged as an $m \times n$ rectangular array, we will refer to the set of all $m \times n$ matrices using the notation $\mathbb{R}^{m \times n}$ . Thus, $\mathsf{A} \in \mathbb{R}^{m \times n}$ . An $m \times 1$ matrix is called a column vector, or just a vector, and a $1 \times n$ matrix is called a row vector. For row and column matrices, it is conventional to use the following shortcut notation: The $(i,1)^{\text{th}}$ entry of $\mathsf{A} \in \mathbb{R}^{n \times 1}$ is written as $A_i$ instead of $A_{i1}$ ; a similar comment applies for row vectors.

A matrix of the form $\mathsf{A} \in \mathbb{R}^{n \times n}$ is called a square matrix of order $n$ . An important example of a square matrix is the identity matrix of order $n$ , written $\mathsf{I} \in \mathbb{R}^{n \times n}$ , defined as follows: $\mathsf{I} = \begin{bmatrix} 1 & 0 & \ldots & 0\\ 0 & 1 & \ldots & 0\\ \vdots & & \ddots & \vdots\\ 0 & 0 & \ldots & 1 \end{bmatrix}.$ Given a square matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ , the ordered set of elements $(A_{11}, A_{22}, \ldots, A_{nn})$ is called the leading diagonal of $\mathsf{A}$ . The identity matrix thus has $1$ in each element of the leading diagonal and has entries zero everywhere else. An important important generalization of this kind of square matrix is a diagonal matrix, whose entries are all zero except on its leading diagonal. For instance, consider the matrix $\mathsf{D} \in \mathbb{R}^{n \times n}$ defined as follows: $\mathsf{D} = \begin{bmatrix} d_1 & 0 & \ldots & 0\\ 0 & d_2 & \ldots & 0\\ \vdots & & \ddots & \vdots\\ 0 & 0 & \ldots & d_n \end{bmatrix}.$ It is conventional to denote the matrix $\mathsf{D}$ as $\text{diag}(d_1, d_2, \ldots, d_n)$ . Using this notation, it can be see that $\mathsf{I} = \text{diag}(1, 1, \ldots, 1)$ .

Given an $m \times n$ matrix $\mathsf{A} \in \mathbb{R}^{m \times n}$ , the transpose of $\mathsf{A}$ is defined as the $n \times m$ matrix whose $(i,j)^{\text{th}}$ entry is the $(j,i)^{\text{th}}$ entry of $\mathsf{A}$ . It is conventional to denote the transpose of $\mathsf{A}\in\mathbb{R}^{m\times n}$ as $\mathsf{A}^T \in \mathbb{R}^{n \times m}$ . Thus, $\mathsf{A}^T_{ij} = \mathsf{A}_{ji}$ . An $n \times n$ matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ is said to be symmetric if $\mathsf{A}^T = \mathsf{A}$ , and skew-symmetric, or antisymmetric, if $\mathsf{A}^T = -\mathsf{A}$ . Here, $-\mathsf{A} \in \mathbb{R}^{n \times n}$ is the matrix whose $(i,j)^{\text{th}}$ entry is $-A_{ij}$ : $(-\mathsf{A})_{ij} = -A_{ij}$ .

Example

Given any matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ , it is the case that $(\mathsf{A}^T)^T = \mathsf{A}$ . To see this note that $(\mathsf{A}^T)_{ij} = A_{ji},$ whence it follows that $((\mathsf{A}^T)^T)_{ij} = A_{ij}$ , thereby proving the result.

Algebraic operations on matrices

Given two $m \times n$ matrices $\mathsf{A},\mathsf{B} \in \mathbb{R}^{m \times n}$ , their sum is defined as the matrix $\mathsf{A} + \mathsf{B} \in \mathbb{R}^{m \times n}$ such that $(\mathsf{A} + \mathsf{B})_{ij} = A_{ij} + B_{ij}.$ Similarly, the scalar multiple of the matrix $\mathsf{A}$ with a real number $c \in \mathbb{R}$ is defined as the matrix $c\mathsf{A} \in \mathbb{R}^{m \times n}$ such that $(c\mathsf{A})_{ij} = c \, A_{ij}.$ The product of two matrices is defined only under special conditions. Given an $m \times n$ matrix $\mathsf{A} \in \mathbb{R}^{m \times n}$ and an $n \times k$ matrix $\mathsf{B} \in \mathsf{R}^{n \times k}$ , the matrix product of $\mathsf{A}$ and $\mathsf{B}$ is the $m \times k$ matrix $\mathsf{AB} \in \mathbb{R}^{m \times k}$ defined as follows: $(\mathsf{AB})_{ij} = \sum_{a = 1}^n A_{ia} B_{aj}.$ Note that the product of two square matrices of the same order is always defined. Note further that for any $\mathsf{A} \in \mathbb{R}^{n \times n}$ , $\mathsf{AI} = \mathsf{IA} = \mathsf{A}$ , where $\mathsf{I} \in \mathbb{R}^{n \times n}$ is the identity matrix of order $n$ .

Example

Suppose that we are given two matrices $\mathsf{A},\mathsf{B} \in \mathbb{R}^{n \times n}$ of order $n$ . Then, the following equation holds: $(\mathsf{A}\mathsf{B})^T = \mathsf{B}^T \mathsf{A}^T.$ To see this, note that if $\mathsf{C} = \mathsf{AB} \in \mathbb{R}^{n \times n}$ , then $(\mathsf{AB})^T_{ij} = \mathsf{C}^T_{ij} = C_{ji} = \sum_{k=1}^n A_{jk}B_{ki}.$ Notice also that $(\mathsf{B}^T\mathsf{A}^T)_{ij} = \sum_{k=1}^n \mathsf{B}^T_{ik}\mathsf{A}^T_{kj} = \sum_{k=1}^n B_{ki}A_{jk} = \sum_{k=1}^n A_{jk}B_{ki}.$ Comparing these two expressions yields the identity $(\mathsf{AB})^T = \mathsf{B}^T \mathsf{A}^T$ .

Example

Any matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ can be written as the sum of a symmetric and skew-symmetric matrix. To see this, note that $A_{ij} = \frac{1}{2}(A_{ij} + A_{ji}) + \frac{1}{2}(A_{ij} - A_{ji}).$ Define the matrices $\mathsf{A}_S \in \mathbb{R}^{n \times n}$ and $\mathsf{A}_A \in \mathbb{R}^{n \times n}$ as $(\mathsf{A}_S)_{ij} = \frac{1}{2}(A_{ij} + A_{ji}), \qquad (\mathsf{A}_A)_{ij} = \frac{1}{2}(A_{ij} - A_{ji}).$ Equivalently, $\mathsf{A}_S = \frac{1}{2}(\mathsf{A} + \mathsf{A}^T), \qquad \mathsf{A}_A = \frac{1}{2}(\mathsf{A} - \mathsf{A}^T).$ It follows that $\mathsf{A} = \mathsf{A}_S + \mathsf{A}_A.$ It is easy to check that $\mathsf{A}_S$ is symmetric and $\mathsf{A}_A$ is skew-symmetric.

Trace and determinant of square matrices

Suppose that $\mathsf{A} \in \mathbb{R}^{n \times n}$ is a square matrix of order $n$ . We will now define two important scalars associated with $\mathsf{A}$ . The first, called the trace of $\mathsf{A}$ , written $\text{tr}(\mathsf{A}) \in \mathbb{R}$ , is defined as follows: $\text{tr}(\mathsf{A}) = \sum_{k=1}^n A_{kk}.$ Thus the trace of a given matrix is the sum of the elements on its leading diagonal.

Example

Suppose that $\mathsf{A} \in \mathbb{R}^{n \times n}$ is a symmetric matrix, and $\mathsf{B} \in \mathbb{R}^{n \times n}$ is a skew-symmetric matrix. Then $\text{tr}(\mathsf{AB}) = 0$ . To see this note that $\begin{split} (\mathsf{AB})_{ij} = \sum_{k=1}^n A_{ik} B_{kj} &= \sum_{k=1}^n \frac{1}{2}(A_{ik} + A_{ki})B_{kj}\\ &= \frac{1}{2}\sum_{k=1}^n A_{ik}B_{kj} + \frac{1}{2}\sum_{k=1}^n A_{ki}B_{kj}\\ &= \frac{1}{2}\sum_{k=1}^n A_{ik}B_{kj} - \frac{1}{2}\sum_{k=1}^n A_{ki}B_{jk}. \end{split}$ Taking the trace on both sides, it follows that $\sum_{j=1}^n \sum_{k=1}^n A_{jk}B_{kj} = \frac{1}{2}\sum_{j=1}^n\sum_{k=1}^n A_{jk}B_{kj} - \frac{1}{2}\sum_{j=1}^n\sum_{k=1}^n A_{kj}B_{jk} = 0.$ This shows that $\text{tr}(\mathsf{AB}) = 0$ .

The second important scalar associated with a square matrix is its determinant. It is simpler to define the determinant of an $n \times n$ matrix inductively. The determinant of a $1 \times 1$ matrix $\mathsf{A} = [A_{11}] \in \mathbb{R}^{1 \times 1}$ , written $\text{det}(\mathsf{A})$ , is defined as follows: $\text{det}(\mathsf{A}) = A_{11}$ . The determinant of a $2 \times 2$ matrix $\mathsf{A} \in \mathbb{R}^{2 \times 2}$ is defined as follows: $\text{det}(\mathsf{A}) = \text{det}\begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22}\end{bmatrix} = A_{11}A_{22} - A_{12}A_{21}.$ For any $n > 2$ , the determinant of an $n \times n$ matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ is defined inductively as follows. The $(i,j)^{\text{th}}$ minor of $\mathsf{A}$ is defined as the $(n - 1) \times (n - 1)$ matrix $M_{(i,j)}(\mathsf{A}) \in \mathbb{R}^{(n-1)\times(n-1)}$ that is obtained by removing the $i^{\text{th}}$ row and $j^{\text{th}}$ column of $\mathsf{A}$ . The determinant of $\mathsf{A}$ is defined using the determinant of the minor as follows: for any $i \in \{1, 2, \ldots, n\}$ , $\text{det}(\mathsf{A}) = \sum_{j=1}^n (-1)^{i + j} A_{ij} \, \text{det}(M_{(i,j)}(\mathsf{A})).$ Note that $i$ in the equation above can be chosen to be any row index. The definition of the determinant is best illustrated with a few examples.

Example

Consider the following $3 \times 3$ matrix $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ : $\mathsf{A} = \begin{bmatrix} A_{11} & A_{12} & A_{13}\\ A_{21} & A_{22} & A_{23}\\ A_{31} & A_{32} & A_{33} \end{bmatrix}$ The determinant of $\mathsf{A}$ is computed as follows: $\begin{split} \text{det}(\mathsf{A}) &= \sum_{j=1}^3 (-1)^{1 + j} A_{1j} \text{det}(M_{(1,j)}(\mathsf{A}))\\ &= A_{11} \text{det} \begin{bmatrix} A_{22} & A_{23}\\ A_{32} & A_{33}\end{bmatrix} - A_{21} \text{det} \begin{bmatrix} A_{12} & A_{13}\\ A_{31} & A_{33}\end{bmatrix} + A_{13} \text{det} \begin{bmatrix} A_{21} & A_{22}\\ A_{31} & A_{32}\end{bmatrix}\\ &= A_{11}(A_{22}A_{33} - A_{23}A_{32}) - A_{22}(A_{21}A_{33} - A_{31}A_{23}) + A_{33}(A_{21}A_{32} - A_{22}A_{31}). \end{split}$ Note that the determinant is expanded using the first row here. It is left as a simple exercise to verify that the value of the determinant is the same irrespective of which row is chosen.

Inverse of a matrix

A matrix $\mathsf{A} \in \mathbb{R}^{n\times n}$ is said to be invertible if there exists another matrix $\mathsf{B} \in \mathbb{R}^{n \times n}$ such that $\mathsf{A}\mathsf{B} = \mathsf{B}\mathsf{A} = \mathsf{I}$ . In this case, is conventional to write $\mathsf{B}$ as $\mathsf{A}^{-1}$ .

An explicit formula for the inverse of an invertible square matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ can be provided. Define first the cofactor matrix of $\mathsf{A}$ , written $\text{cof}(\mathsf{A}) \in \mathbb{R}^{n \times n}$ as follows: $\text{cof}(\mathsf{A})_{ij} = (-1)^{i + j} M_{(i,j)}(\mathsf{A}),$ where $M_{(i,j)}(\mathsf{A}) \in \mathbb{R}^{(n-1)\times(n -1)}$ is the $(i,j)^{\text{th}}$ minor of $\mathsf{A}$ . The transpose of the cofactor matrix of $\mathsf{A}$ is called the adjoint of $\mathsf{A}$ , written $\text{adj}(\mathsf{A}) \in \mathbb{R}^{n \times n}$ : $\text{adj}(\mathsf{A}) = (\text{cof}(\mathsf{A}))^T.$ With these definitions in place, the inverse of an invertible square matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ can be written as $\mathsf{A}^{-1} = \frac{1}{\text{det}(\mathsf{A})}\text{adj}(\mathsf{A}).$ Note that this also informs us that the matrix $\mathsf{A}$ is invertible if and only if its determinant is non-zero. This fact is frequently used in applications.

Example

Suppose that $\mathsf{A}, \mathsf{B} \in \mathbb{R}^{n \times n}$ are invertible matrices. Then $(\mathsf{AB})^{-1} = \mathsf{B}^{-1}\mathsf{A}^{-1}$ . To show this, note that $(\mathsf{B}^{-1}\mathsf{A}^{-1})\mathsf{AB} = \mathsf{B}^{-1}(\mathsf{A}^{-1}\mathsf{A})\mathsf{B} = \mathsf{B}^{-1}\mathsf{B} = \mathsf{I}.$ Similarly, it can be shown that $\mathsf{AB}(\mathsf{B}^{-1}\mathsf{A}^{-1}) = \mathsf{I}$ , thereby proving the claim.

Example

As an elementary illustration of the computation of the inverse, let us now compute the inverse of the matrix $\mathsf{A} \in \mathbb{R}^{2 \times 2}$ , where $\mathsf{A} = \begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22}\end{bmatrix}.$ The cofactor of $\mathsf{A}$ is easily computed as $\text{cof}(\mathsf{A}) = \begin{bmatrix} A_{22} & -A_{21}\\ -A_{12} & A_{11}\end{bmatrix}.$ The determinant of $\mathsf{A}$ is computed easily as $(A_{11}A_{22} - A_{12}A_{21})$ . Putting all this together, it follows from a simple computation that $\begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22}\end{bmatrix}^{-1} = \frac{1}{(A_{11}A_{22} - A_{12}A_{21})}\begin{bmatrix} A_{22} & -A_{12}\\ -A_{21} & A_{11}\end{bmatrix}.$ It can be checked by means of a direct substitution that this is indeed the inverse of $\mathsf{A}$ .

Linear systems of equations

An important application where matrices naturally find use is the solution of linear systems of equations. To understand what this means, suppose that we are given constants $\{A_{ij}\}$ and $\{b_i\}$ where $i, j = 1, \ldots, n$ . We are interested in finding real numbers $(v_i)_{i=1}^n$ that satisfy the following set of equations: $\begin{split} A_{11} v_1 + A_{12} v_2 + \ldots + A_{1n} v_n &= b_1,\\ A_{21} v_1 + A_{22} v_2 + \ldots + A_{2n} v_n &= b_2,\\ \ldots & \ldots\\ A_{n1} v_1 + A_{n2} v_2 + \ldots + A_{nn} v_n &= b_n. \end{split}$ These equations can be succinctly written as follows: for every $i \in \{1, \ldots, n\}$ , $\sum_{j=1}^n A_{ij} v_j = b_i.$ These equations can be written even more succinctly in the matrix form $\mathsf{A}\mathsf{v} = \mathsf{b},$ where $\mathsf{A} = \begin{bmatrix} A_{11} & \ldots & A_{1n}\\ \vdots & \ddots & \vdots\\ A_{n1} & \ldots & A_{nn} \end{bmatrix} \in \mathbb{R}^{n \times n}, \quad \mathsf{v} = \begin{bmatrix} v_1\\ \vdots\\ v_n\end{bmatrix} \in \mathbb{R}^{n \times 1}, \quad \mathsf{b} = \begin{bmatrix} b_1\\ \vdots\\ b_n\end{bmatrix} \in \mathbb{R}^{n \times 1}.$ Thus, the system of linear equations has been reduced to an equation involving matrices.

The system of equations $\mathsf{Av} = \mathsf{b}$ does not always possess a unique solution. In the special case, however, when $\text{det}(\mathsf{A}) \neq 0$ , the matrix $\mathsf{A}$ is invertible, and a unique solution for the system of equations $\mathsf{Av} = \mathsf{b}$ can be found as $\mathsf{v} = \mathsf{A}^{-1}\mathsf{b},$ as can be checked with direct substitution.

Remark

It is to be noted that when $\mathsf{A}$ is invertible, it is rare in practice to compute the solution of the set of linear equations $\mathsf{Av} = \mathsf{b}$ using the relation $\mathsf{v} = \mathsf{A}^{-1}\mathsf{v}$ . This is due to the fact that calculating the inverse is computationally expensive for large matrices.

Example

Perhaps the simplest method to solve the system of equations $\mathsf{Av} = \mathsf{b}$ is the Gaussian elimination algorithm. The idea behind this algorithm is to systematically reduce the system of equations to the successive solution of equations with just one unknown variable. Rather than explaining the general approach, it is instructive to look at a specific example.

Suppose that we are interested in solving the following system of equations: $\begin{bmatrix} A_{11} & A_{12}\\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} v_1\\ v_2 \end{bmatrix} = \begin{bmatrix} b_1\\ b_2 \end{bmatrix}$ It is assumed that the matrix $\mathsf{A} \in \mathbb{R}^{2 \times 2}$ whose $(i,j)^{\text{th}}$ entry is $A_{ij}$ is invertible, and, without loss of generality, that $A_{21} \neq 0$ Begin by multiplying the second equation in $\sum_{j=1}^2 A_{ij}v_j = b_j$ by $A_{11}/A_{21}$ . This yields the following modified set of equations $\begin{split} A_{11} v_1 + A_{12} v_2 &= b_1,\\ A_{11} v_1 + \frac{A_{11}}{A_{21}}A_{22} v_2 &= \frac{A_{11}}{A_{21}}b_2. \end{split}$ Retaining the first equation as is and subtracting the first from the second equation, we get $\begin{split} A_{11} v_1 + A_{12} v_2 &= b_1\\ \left(\frac{A_{11}}{A_{21}}A_{22} - A_{12}\right)v_2 &= \frac{A_{11}}{A_{21}}b_2 - b_1. \end{split}$ Solving the last equation for $v_2$ , we immediately see that $v_2 = \frac{1}{A_{11}A_{22} - A_{12}A_{21}}(-A_{21}b_1 + A_{11}b_2).$ Substituting this expression for $v_2$ in the first equation and solving for $v_1$ , we get, after some algebraic manipulation that $v_1 = \frac{1}{A_{11}A_{22} - A_{12}A_{21}}(A_{22}b_1 - A_{12}b_2).$ Note that this is exactly the solution obtained by solving the system of equations $\mathsf{Av} = \mathsf{b}$ using the formula $\mathsf{v} = \mathsf{A}^{-1}\mathsf{b}$ .

The foregoing calculations can be visualized as follows. Begin by collecting together the elements of the matrices $\mathsf{A}$ and $\mathsf{b}$ as follows: $\begin{bmatrix} A_{11} & A_{12} & b_1\\ A_{21} & A_{22} & b_2 \end{bmatrix}$ The sequence of transformations carried out earlier can be summarized as follows: $\begin{bmatrix} A_{11} & A_{12} & b_1\\ A_{21} & A_{22} & b_2 \end{bmatrix} \quad\rightarrow\quad \begin{bmatrix} A_{11} & A_{12} & b_1\\ 0 & \left(\dfrac{A_{11}}{A_{21}}A_{22} - A_{12}\right) & \left(\dfrac{A_{11}}{A_{21}}b_2 - b_1\right) \end{bmatrix}.$ The matrix in the right hand side of the foregoing transformation is said to be in the upper triangular form and can be solved by back-substitution from the last equation upwards. The solution of a general linear system of equations is computed using an analogous and straightforward extension of this approach.

Example

As a concrete illustration of the Gaussian elimination algorithm, consider the solution of the following set of linear equations: $\begin{bmatrix} 1 & -2 & 4\\ 2 & -3 & 1\\ 3 & 2 & -1 \end{bmatrix} \begin{bmatrix} v_1\\ v_2\\ v_3 \end{bmatrix} = \begin{bmatrix} 9\\ -1\\ 4 \end{bmatrix}.$ The Gaussian elimination procedure can be illustrated in a sequence of two steps. In the first step, appropriate multiples of the first equations are used to eliminate $v_1$ from the second and third equations. In the second step, an appropriate multiple of the second equation is used to eliminate $v_2$ from the third equation. These steps are summarized below: $\begin{bmatrix} 1 & -2 & 4 & 9\\ 2 & -3 & 1 & -1\\ 3 & 2 & -1 & 4 \end{bmatrix} \quad\rightarrow\quad \begin{bmatrix} 1 & -2 & 4 & 9\\ 0 & -1 & 7 & 19\\ 0 & -8 & 13 & 23 \end{bmatrix} \quad\rightarrow\quad \begin{bmatrix} 1 & -2 & 4 & 9\\ 0 & -1 & 7 & 19\\ 0 & 0 & 43 & 129 \end{bmatrix}$ These equations are solve by back-substitution as follows: $\begin{split} 43v_3 = 129 &\quad\Rightarrow\quad v_3 = \frac{129}{43} = 3,\\ -v_2 + 7v_3 = 19 &\quad\Rightarrow\quad v_2 = -(19 - 7\times3) = 2,\\ v_1 - 2v_2 + 4v_3 = 9 &\quad\Rightarrow\quad v_1 = 9 + 2\times2 - 4\times3 = 1. \end{split}$ We thus obtain the solution of the linear system of equations as $v_1 = 1$ , $v_2 = 2$ , and $v_3 = 3$ . The fact that this is indeed a solution of the linear system of equations presented above can be verified by direct substitution.

Eigenvalues and eigenvectors

To wrap up the discussion of elementary matrix algebra, let us consider the eigenvalue problem. Suppose that we are given a matrix $\mathsf{A} \in \mathbb{R}^{n \times n}$ of order $n$ . If there exists a vector $\mathsf{v} \in \mathbb{R}^{n \times 1}$ and a real number $a \in \mathbb{R}$ such that $\mathsf{Av} = a\mathsf{v},$ then $a$ is called the eigenvalue of $\mathsf{A}$ corresponding to the eigenvector $\mathsf{v}$ .

Remark

Note that the zero vector $\mathsf{0} \in \mathbb{R}^{n \times 1}$ trivially satisfies this equation for any choice of $a \in \mathbb{R}$ . We will exclude this trivial solution and assume henceforth that every eigenvector is non-zero.

Note that the equation $\mathsf{A}\mathsf{v} = a\mathsf{v}$ can be written as the equation $(\mathsf{A} - a\mathsf{I})\mathsf{v} = \mathsf{0}.$ For this equation to have a non-trivial solution, it follows at once that $\text{det}(\mathsf{A} - a\mathsf{I}) = 0.$ This a polynomial equation in $a$ of degree $n$ , called the characteristic equation of $\mathsf{A}$ , and has $n$ solutions in general. These solutions correspond to the eigenvalues of the matrix $\mathsf{A}$ .

Example

Consider the matrix $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ given as follows: $\mathsf{A} = \begin{bmatrix} -2 & -4 & 2\\ -2 & 1 & 2\\ 4 & 2 & 5 \end{bmatrix}.$ Let us compute the eigenvalues of $\mathsf{A}$ by computing its characteristic polynomial. Note first that the matrix $\mathsf{A} - a\mathsf{I}$ has the following form: $\mathsf{A} - a\mathsf{I} = \begin{bmatrix} -2-a & -4 & 2\\ -2 & 1-a & 2\\ 4 & 2 & 5-a \end{bmatrix}.$ Computing the determinant of this equation and setting it to zero, the characteristic equation $\text{det}(\mathsf{A} - a\mathsf{I}) = 0$ is obtained as $-a^3 + 4a^2 + 27a - 90 = 0, \quad\equiv\quad -(a - 3)(a - 6)(a + 5) = 0.$ This shows at once that the eigenvalues of the matrix $\mathsf{A}$ are $3$ , $6$ , and $-5$ .

The characteristic equation, when expanded fully, has the following structure: $(-a)^n + I_1 (-a)^{n-1} + \ldots + I_n = 0.$ The constants $I_1, I_2, \ldots, I_n$ are called the invariants of $\mathsf{A}$ , and can be related to the eigenvalues of $\mathsf{A}$ . Notably, the invariants $I_1$ and $I_n$ are computed as $\begin{split} I_1 &= \text{tr}(\mathsf{A}) = \sum_{i=1}^n a_i,\\ I_3 &= \text{det}(\mathsf{A}) = \prod_{i=1}^n a_i. \end{split}$ The other invariants of $\mathsf{A}$ can be similarly related to the eigenvalues of $\mathsf{A}$ , but they do not have a simple interpretation as $I_1$ and $I_n$ .

Example

For the matrix $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ considered in the previous example, the characteristic equation takes the form $-a^3 + I_1 a_2 - I_2 a + I_3 = 0.$ It can be checked that $\begin{split} I_1 = 4 = 3 + 6 - 5 = a_1 + a_2 + a_3,\\ I_3 = -90 = 3\times6\times(-5) = a_1 a_2 a_3. \end{split}$ Further more, for matrices of order three, it is true that $I_2 = a_1a_2 + a_2a_3 + a_3a_1,$ as can be checked with a simple calculation.

The Cayley-Hamilton theorem states that the matrix $\mathsf{A}$ satisfies the characteristic equation: $(-\mathsf{A})^n + I_1 (-\mathsf{A})^{n-1} + \ldots + I_n\mathsf{I} = \mathsf{0}.$ This equation is often used in practice to simplify a variety of calculations. The following example provides an elementary illustration of the Cayley-Hamilton theorem.

Example

Suppose that $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ is an invertible matrix. Then, its inverse can be computed using the Cayley-Hamilton theorem as follows: $\begin{split} & -\mathsf{A}^3 + I_1\mathsf{A}^2 - I_2\mathsf{A} + I_3\mathsf{I} = 0\\ \Rightarrow & I_3\mathsf{I} = \mathsf{A}^3 - I_1\mathsf{A}^2 + I_2\mathsf{A}\\ \Rightarrow & \mathsf{A}^{-1} = \frac{1}{I_3}\left(\mathsf{A}^2 - I_1\mathsf{A} + I_2\right). \end{split}$ Notice how the fact that $I_3 = \text{det}(\mathsf{A}) \neq 0$ is used in the last step of this calculation. The Cayley-Hamilton theorem thus provides a convenient expression for the inverse of $\mathsf{A}$ ; compare this with the general expression for the inverse provided earlier.

The eigenvectors of $\mathsf{A} \in \mathbb{R}^{n \times n}$ can be obtained by substituting the eigenvalues, successively in the equation $\mathsf{A}\mathsf{v} = a\mathsf{v}$ . Notice that $\mathsf{v}$ is an eigenvector of $\mathsf{A}$ corresponding to the eigenvalue $a$ , then $c\mathsf{v}$ is also an eigenvector of $\mathsf{A}$ corresponding to the same eigenvalue $a$ for any $c \in \mathbb{R}$ . This fact is often used to single out a particular value of $c$ , and hence a particular eigenvector. This is best illustrated with an example.

Example

Consider the matrix $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ considered earlier: $\mathsf{A} = \begin{bmatrix} -2 & -4 & 2\\ -2 & 1 & 2\\ 4 & 2 & 5 \end{bmatrix}.$ The eigenvalues of $\mathsf{A}$ were computed earlier as $3, 6, -5$ . Let us now compute the corresponding eigenvectors.

Consider first the eigenvalue $a_1 = 3$ . Substituting this in the equation $\mathsf{A}\mathsf{v}_1 = a_1\mathsf{v}_1$ , we get $\begin{bmatrix} -2 & -4 & 2\\ -2 & 1 & 2\\ 4 & 2 & 5 \end{bmatrix} \begin{bmatrix} v_{1,1}\\ v_{1,2}\\ v_{1,3} \end{bmatrix} = 3 \begin{bmatrix} v_{1,1}\\ v_{1,2}\\ v_{1,3} \end{bmatrix} \quad\equiv\quad \begin{bmatrix} -5 & -4 & 2\\ -2 & -2 & 2\\ 4 & 2 & 2 \end{bmatrix} \begin{bmatrix} v_{1,1}\\ v_{1,2}\\ v_{1,3} \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}.$ In the equations above, we have used the notation $(\mathsf{v}_1)_i = v_{1,i}$ . Notice that this matrix equation does not have a unique solution since $\text{det}(\mathsf{A} - a_1\mathsf{I}) = 0$ . To compute all possible solutions, let us compute the components $v_{1,1}$ and $v_{1,2}$ in terms of $v_{1,3}$ . Begin by rewriting the first two equations as $\begin{split} 5v_{1,1} + 4v_{1,2} &= 2v_{1,3},\\ 2v_{1,1} + 2v_{1,2} &= 2v_{1,3}. \end{split}$ These equations can be readily solved to yield $v_{1,1} = -2v_{1,3}, \qquad v_{2,1} = 3v_{1,3}.$ Thus, every vector of the form $[-2a \; 3a \; a]^T \in \mathbb{R}^{3 \times 1}$ , where $a \in \mathbb{R}$ , is an eigenvector of $\mathsf{A}$ corresponding to the eigenvalue $3$ . Different choices for $a$ can be chosen depending on the context. For instance, choosing $a = 1$ , we obtain the eigenvector $[-2 \; 3 \; 1]^T$ . Requiring that the eigenvector has unit length, on the other hand yields the eigenvector $[-2/\sqrt{14} \; 3/\sqrt{14} \; 1/\sqrt{14}]^T$ .

Remark

The dot product of two vectors $\mathsf{u}, \mathsf{v} \in \mathbb{R}^{n \times 1}$ , written $\mathsf{u}\cdot\mathsf{v} \in \mathbb{R}$ , is defined as $\mathsf{u}^T \mathsf{v}$ . Note that it follows from the definition that $\mathsf{u} \cdot \mathsf{v} = \mathsf{v}^T \mathsf{u}$ . The length of a vector $\mathsf{u} \in \mathbb{R}^{n \times 1}$ is defined as $\sqrt{\mathsf{u}^T\mathsf{u}}$ .

The other eigenvectors of $\mathsf{A}$ can be computed similarly. It is left as an exercise to check that $[1 \; 6 \; 16]^T$ and $[-2 \; -1 \; 1]^T$ are the eigenvectors of $\mathsf{A}$ corresponding to the eigenvalues $6$ and $-5$ , respectively.

Suppose that $\mathsf{v}_1, \ldots, \mathsf{v}_n \in \mathbb{R}^{n \times 1}$ are the eigenvectors of $\mathsf{A} \in \mathbb{R}^{n \times n}$ corresponding to the eigenvalues $a_1, \ldots, a_n \in \mathbb{R}$ , respectively. Then the set of equations $\mathsf{A}\mathsf{v}_i = a\mathsf{v}_i$ , where $i = 1, \ldots, n$ , can be written compactly as $\mathsf{A}\mathsf{V} = \mathsf{V}\mathsf{D},$ where $\mathsf{V} = \begin{bmatrix} v_{1,1} & v_{2,1} & \ldots & v_{n,1}\\ v_{1,2} & v_{2,2} & \ldots & v_{n,2}\\ \vdots & & \ddots & \vdots\\ v_{1,n} & v_{2,n} & \ldots & v_{n,n} \end{bmatrix} \in \mathbb{R}^{n \times n}, \qquad \mathsf{D} = \text{diag}(a_1, a_2, \ldots, a_n) \in \mathbb{R}^{n \times n}.$ Notice that the first column of $\mathsf{V}$ is the first eigenvector of $\mathsf{A}$ , the second column of $\mathsf{V}$ is the second eigenvector of $\mathsf{A}$ , and so on. It can be shown that if the eigenvectors are linearly independent (A proper definition of this term is provided in these notes in the general context of finite dimensional vector spaces.), then the matrix $\mathsf{V}$ is invertible. In this case, $\mathsf{A} = \mathsf{V} \mathsf{D} \mathsf{V}^{-1}.$ This equation is called the eigendecomposition of $\mathsf{A}$ , and plays an important role in many applications.

Example

Consider the matrix $\mathsf{A} \in \mathbb{R}^{3 \times 3}$ considered in the previous examples: $\mathsf{A} = \begin{bmatrix} -2 & -4 & 2\\ -2 & 1 & 2\\ 4 & 2 & 5 \end{bmatrix}.$ The matrices $\mathsf{V} \in \mathbb{R}^{3 \times 3}$ and $\mathsf{D}\in \mathsf{3 \times 3}$ can be constructed as follows: $\mathsf{V} = \begin{bmatrix} -2 & 1 & -2\\ 3 & 6 & -1\\ 1 & 16 & 1 \end{bmatrix}, \qquad \mathsf{D} = \text{diag}(3,6,-5).$ Note that the matrix $\mathsf{V}$ is invertible in this case. It is left as a simple exercise to check that $\begin{bmatrix} -2 & -4 & 2\\ -2 & 1 & 2\\ 4 & 2 & 5 \end{bmatrix} = \begin{bmatrix} -2 & 1 & -2\\ 3 & 6 & -1\\ 1 & 16 & 1 \end{bmatrix} \begin{bmatrix} 3 & 0 & 0\\ 0 & 6 & 0\\ 0 & 0 & -5 \end{bmatrix} \begin{bmatrix} -2 & 1 & -2\\ 3 & 6 & -1\\ 1 & 16 & 1 \end{bmatrix}^{-1},$ thereby verifying the eigendecomposition $\mathsf{A} = \mathsf{V}\mathsf{D}\mathsf{V}^{-1}$ of $\mathsf{A}$ .

TO DO

Degenerate cases