Inner Product Spaces
We begin with a discussion of the algebraic properties of vectors, which are defined as elements of a special kind of a set called a vector space. We will then define an additional structure called the inner product that significantly simplifies the mathematical development. We will learn how to represent a vector with respect to a chosen basis, and how this representation changes when the basis changes. Finally, we will study linear maps between vector spaces, and their representation with respect to chosen bases of the vector spaces. To keep the presentation simple, technical proofs for many of the statements given here are omitted. Basic notions from set theory and matrix algebra, reviewed in two appendices, are assumed to be known.
Introduction
A vector is typically introduced in high school algebra as a quantity with both a magnitude and a direction. A representation of a vector in the familiar three dimensional Euclidean space is shown in the following figure:
Representation of a vector in
If represent the unit vectors along the axes, respectively, a vector can be expressed uniquely as , where are the Cartesian components of the vector (with respect to the basis vectors and ).
There are two core operations associated with vectors:
-
Vector addition: Given two vectors and , we can add them to get a new vector , defined as Geometrically, the new vector is obtained by placing the tail of at the head of . The sum of the two vectors is then the vector which shares it’s tail with and head with , as shown below:
Vector addition in
Note that we get the same vector independent of the order of addition: . For this reason, vector addition is said to be commutative. It can also be shown easily that if are three vectors, then . This property is called associativity.
-
Scalar multiplication of a vector with a real number: Given any vector , we can multiply it by some real number to get the new vector that is times as long as : . This is illustrated in the following figure:
Scalar multiplication of a vector in
Note that we will need only real vector spaces in what follows.
Some vector spaces admit an algebraic operation called the inner product. For instance, in three dimensional space, given two vectors and , we can combine them using the dot product to produce a real number . Using the dot product, it is customary to define the length or Euclidean norm of a vector as the (non-negative) real number .
For vectors in three dimensional space (only), we also an additional important algebraic operation called the cross product: we can combine two vectors and using the cross product to obtain a new vector .
In what follows, we will first focus on just vector addition and scalar multiplication. These two operations embody a concept known as linearity, which is fundamental to appreciate what a vector is. We will then study inner product spaces, which are vector spaces with an additional structure known as an inner product, and highlight Euclidean spaces as an important example of inner product spaces.
Remark
The generalization of the cross product is known as the wedge product. We will not study the wedge product in detail in these notes since it is beyond the scope of these notes.
Notice that all the information about the vector is contained in the ordered set of three real numbers . What this means is that given any vector in three dimensional space, we can uniquely associate with it a triple of real numbers, and vice versa. The set of all such ordered triples is the set , which is the set of all triples of real numbers. If we define addition of two such ordered triples and multiplication of an ordered triple with a real number as, then the elements of , which are ordered triples of real numbers, behave exactly as the geometric picture of vectors as arrows that we just discussed, as far as the core properties of vector addition and scalar multiplication are concerned.
We now have two ways of representing a vector: as an arrow in three dimensional space, and as a set of ordered triple of real numbers. Both of these represent the same object. But the representation in terms of ordered tuples of real numbers immediately admits a generalization to cases where the pictorial representation fails. It is evident from the above discussion that there is nothing special about the number when we considered an ordered triple of real numbers. We can easily generalize this to a set of ordered -tuple of real numbers , where could be any arbitrary positive integer. We can define addition and scalar multiplication in analogous to the case of the ordered triples,
where and are two elements of , and is a real number. We will call this the standard linear structure on , and elements of as vectors in . Note that there is no obvious way to picture an arrow in the -dimensional space for . Thus, by choosing the right representation, we can extend the elementary notion of vectors as quantities with magnitude and direction to more general objects.
Linear structure
Let us now generalize the previous discussion to define abstract vector spaces. Suppose that is a set such that it is possible to define two maps called vector addition and scalar multiplication, respectively, in , that satisfy the following properties: for any and ,
-
Associativity of addition: ,
-
Commutativity of addition: ,
-
Existence of additive identity: there exists a unique element , called the additive identity of such that ,
-
Existence of additive inverse: for every , there exists a unique element such that ,
-
Distributivity of scalar multiplication over vector addition: ,
-
Distributivity of scalar multiplication over scalar addition: ,
-
Compatibility of scalar multiplication with field multiplication: ,
-
Scaling property of scalar multiplication: .
A set that has two maps that satisfy these axioms is called a (real) vector space, or a (real) linear space. What we have accomplished through these axioms is to endow a set with a notion of addition that allows us to add two elements of the set to get a third element. We have also provided a mechanism to multiply a member of this set by a real number to get another element of this set. The maps and are said to provide a linear structure on . Elements of are called vectors.
Remark
Some textbooks mention additional closure axioms that indicate that if , then , and given any and , . We don’t specify this explicitly since this is already implied by the function definitions and . As mentioned in the previous discussion on set theory, we will always insist on mentioning the domain and codomain of every map/function we encounter. Hence, the so-called closure axioms are redundant for our purposes.
Remark
Following standard convention, we will often use the shorthand notation for .
Example
The simplest example of a real vector space is the set of real numbers with addition and multiplication defined in the standard manner. More generally, consider the set consisting of all -tuples of real numbers: Given , , and , let us define addition and scalar multiplication as It is straightforward to verify that with addition and scalar multiplication thus defined, the triple is a real vector space. Note that the additive identity in is the zero vector , and the additive inverse of is .
Example
Consider the set of all matrices with real entries and defined addition of matrices and scalar mutliplication of a matrix with a real number in the usual sense (see Appendix ). It is easily checked that the set of all matrices is a vector space. The zero vector in is the matrix with zero in all of its entries, and the additive inverse of a given matrix is just its negative.
Example
The definition of vector spaces admits more general kinds of objects. As a simple example, consider the set consisting of all real-valued and continuous functions of one real variable. Given any , and any , we can define addition and scalar multiplication pointwise as follows: for any , It is not difficult to verify that is a real vector space. The additive identity in is the zero function defined as follows: for any , . The additive inverse of is the function defined as follows: for any , .
Subspaces and linear independence
A subset of a vector space is said to be a linear subspace of if is also a vector space. Note that it is implicitly assumed that both and share the operations of vector addition and scalar multiplication. It can be easily checked that if a subset of a real vector space has the property that for any , and any , , then is a linear subspace of . This property is often used to check if a given subset of a vector space is a linear subspace. An immediate consequence of this is the fact that every linear subspace of a given vector space must contain the additive identity .
Example
Consider the following subsets of : To see that is a linear subspace of , note that if , , and , then . Since , we see that . This shows that is indeed a linear subspace of . It is likewise verified that is also a linear subspace , while is not.
The intersection of two linear subspaces is also a linear subspace. Moreover, any finite intersection of linear subspaces of a vector space is also a linear subspace of , as can be easily checked.
Example
In the previous example, the intersection of the subspaces and of is easily seen to be which is the trivial subspace of .
Two non-zero vectors in a vector space are said to be linearly independent iff , where are real numbers, implies that and . Thus and are linearly independent iff the only linear combination of and that yields is the trivial linear combination . If this is not true, then and are said to be linearly dependent. This definition can be easily extended to a finite set of non-zero vectors , where . Thus, the set of vectors in is said to be linearly independent iff the only real numbers that satisfy the equation are .
Example
Consider the vectors and in the two dimensional Euclidean space . For real numbers , note that Solving the equations and , we immediately see that , which shows that and are linearly independent vectors in . If , then and are linearly dependent since if for , These equations do not imply that . For instance, satisfies the condition. We thus see that and are linearly dependent.
Basis of a vector space
We will now introduce a very important tool called the basis of a vector space. The basic idea is that once we identify a basis for a vector space, we can use the linear structure inherent in the space to reduce all computations related to the vector space as a whole, to just computations on the basis set. We will need a few definitions first in order to define the basis.
The linear span of a set of vectors in , written as , is defined as the set of all it’s linear combinations: It is also common to refer to the linear span as just the span. It is straightforward to check that the linear span of any finite collection of vectors in a vector space is a linear subspace of the vector space .
Example
Let us consider the vectors and in . The span of these two vectors is the following subset of : It is left as an easy exercise to verify that this is indeed a linear subspace of .
An ordered subset a vector space is said to constitute a basis of if
-
is linearly independent, and
-
.
In the special case when a vector space is spanned by a finite ordered set of vectors , for some , the vector space is said to be finite dimensional, and the number is called the dimension of the vector space (written ). A vector space that is not finite dimensional is said to be infinite dimensional. In what follows, we will only deal with finite dimensional vector spaces.
If is an -dimensional vector space, and is a basis for , then we will often abbreviate the basis as , or just , when the dimension is evident from the context.
Example
Let us revisit the -dimensional Euclidean space , and consider the following vectors: for any , It is easy to check that the is a basis of . In particular, note that any can be written as The ordered set is called the standard basis of . Note that in the special case of , the set of basis vectors is identical to the basis set introduced in the beginning of this section.
Given a vector space of dimension , it is possible to choose an infinite number of bases for . This non-uniqueness in the choice of the basis can be easily understood as follows. Pick any . Choose from the set , from the set , and so on. This process will terminate in steps since the vector space is of dimension . The resulting set of vectors is a basis for .
Inner products and norms
We will now introduce a special additional structure on an abstract vector space called the inner product. There are two reasons for introducing the inner product right at the outset: first, the mathematical development becomes significantly simpler, and second, many important applications in science and engineering can be studied in this setting.
Remark
There is an elegant theory of abstract linear spaces, both in the finite and infinite dimensional cases, where inner products are not defined. We will however not develop this general theory here.
Given a vector space , an inner product on is defined as a map of the form such that, for any and ,
-
Symmetry: ,
-
Bilinearity: ,
-
Positive definiteness: , and iff .
A vector space endowed with a map that satisfies the three properties mentioned above is said to be an inner product space. All vector spaces considered henceforth will be assumed to be inner product spaces, unless stated otherwise.
Remark
Given , we will often write as just , with the understanding that the inner product is evident from the context.
Example
The simplest, and also the most important, example of an inner product space is the vector space defined earlier, with the inner product defined as follows: for any and , It is easy to check that this is indeed an inner product. The vector space with this inner product is called the Euclidean space of dimension . We will use the same symbol to denote the -dimensional Euclidean space.
Example
Define the set as the set of all real valued polynomials of degree less than or equal to on the interval : Define the function as follows: for any , It is left as a simple exercise to verify that this function is actually an inner product on the linear space with addition and scalar multiplication defined pointwise.
The inner product on a vector space can be used to define a norm on . A norm on a vector space is a function of the form such that, for any and ,
-
Positive definiteness: , and iff ,
-
Homogeneity: ,
-
Sub-additivity: .
Here, refers to the absolute value of . The property of sub-additivity is also referred to as the triangle inequality.
A vector space equipped with a norm that satisfies these properties is called a normed vector space, or a normed linear space. Note that every inner product space is a normed linear space. To see this, note that given an an inner product , the norm induced by this inner product is defined as follows: for any , The norm induced by the Euclidean inner product on is called the standard Euclidean norm on .
Remark
In general, a normed vector space is not an inner product space. If, however, the norm on a normed vector space satisfies the following relation, called the parallelogram identity, for any , then it is possible to define an inner product using the norm as follows: for any , This relation is called the polarization identity.
Example
Let us consider the -dimensional Euclidean space with the standard inner product, defined earlier. The norm of a vector is easily computed as This norm is called the standard Euclidean norm, or the -norm on .
A variety of other norms can be defined on a given inner product space. For instance, the -norm on can be defined as follows: for any and , The -norm on is defined as Note that the standard Euclidean norm corresponds to .
Remark
There is an important theorem that states that all norms on a finite dimensional vector space are equivalent. This means the following: given norms and on a finite dimensional vector space , there exists constants such that, for any , Without getting into technical details, this roughly means that the conclusions we draw about topological notions in a normed vector space are independent of the specific norm chosen.
Cauchy-Schwarz inequality
Let be a real vector space equipped with an inner product . An important property of inner products that turns out to be quite useful in practice is discussed now. Given any , the Cauchy-Schwarz inequality states that Furthermore, the equality holds iff and are linearly dependent. The proof of the Cauchy-Schwarz inequality is quite easy: for any , and , Substituting in this inequality, we get which immediately yields the Cauchy-Schwartz inequality. To prove the second part, note that if and are linearly dependent, then, without loss of generality, for some . In this case, . On the other hand, if , consider the vector , where, It is straightforward to show that , and hence that . This shows that and are linearly dependent. The Cauchy-Schwartz inequality is thus proved.
The angle between two vectors is defined via the relation Note how the Cauchy-Schwarz inequality implies that this definition is well-defined. If the angle between two vectors is , they are said to be orthogonal. Equivalently, are said to be orthogonal if . If are orthogonal, and if it is further true that , then and are said to be orthonormal.
Example
The vectors and are orthogonal since . They are not orthonormal since . The vectors and are, however, orthonormal.
The following inequality also holds for any in an inner product space : Recall that this is the triangle inequality. The triangle inequality is readily proved using the Cauchy-Schwarz inequality: for any : The triangle inequality follows by taking the square root on both sides.
Gram-Schmidt orthogonalization
Let us now reconsider the notion of a basis of an -dimensional vector space in the special case when the vector space also has an inner product defined on it. We say that a basis of is orthogonal if whenever , and . If it is further true that for every , we say that the basis is orthonormal. The fact that a basis of is orthonormal can be succinctly expressed by the following equation: for any , where is the Krönecker delta symbol that is defined as follows:
Example
It is straightforward to verify that the standard basis of is an orthonormal basis, since it follows from the definition of the standard basis that .
The Gram-Schmidt orthogonalization procedure is an algorithm that helps us to transform any given basis of into an orthonormal basis . The algorithm works as follows:
-
Let
-
We now define by removing the component of along the direction : It is easy to check that and .
-
We then obtain in a similar manner by removing the components of along and : It is straightforward to verify that , and .
-
Continuing this process, we can construct an orthonormal basis .
Example
As a simple illustration of the Gram-Schmidt orthogonalization process, let us consider the vectors and . We verified earlier that these vectors are linearly independent, and that they form a basis of . They are however not orthogonal since . Let us orthonormalize this basis using the Gram-Schmidt process. To start with, let us normalize : We can now construct by projecting out the component of along : It is easily checked that and , and that We have thus constructed an orthonormal basis of starting from the general basis of by following the Gram-Schmidt algorithm.
Note that the orientation of the basis obtained here is the opposite of the orientation of the standard basis . To see this, embed these vectors in to get the vectors , , , , and note that , whereas - they have opposite signs! This doesn’t really affect the Gram-Schmidt algorithm because if is a basis of , then so is .
Remark
It turns out that the choice of an orthonormal basis is sufficient for most applications. In the discussion below, we will first study various concepts with respect to the choice of an orthonormal basis, since the calculations are much simpler in this case. The general case of arbitrary bases will be discussed after this to give an idea of how some calculations can be more involved with respect to general bases.
Basis representation of vectors
We will now study the representation of a vector in an -dimensional inner product space with respect to an orthonormal basis of . It is worth reiterating that we will only deal with orthonormal bases unless otherwise stated.
The fact that is a basis of implies that every can be written as where for every . This is called the representation of with respect to the basis . The real numbers are called the components of with respect to the basis .
To compute the components , we can exploit the fact that is an orthonormal basis: The components thus computed are unique since we have explicitly constructed each component as . We can alternatively show the uniqueness of the components based on the fact that the basis vectors are linearly independent by definition.
Remark
Notice how we have represented the sum on the right without representing the summation index, and the range of summation. We will write , or just in place of , whenever the range under consideration is obvious from the context. If no index is associated with the summation symbol, as in , it will be assumed that the sum is with respect to all repeating indices. In addition, we will assume that the range of the all the indices involved is known from the context. While on this, it is worth noting that many authors employ the Einstein summation convention, according to which, a sum of the form is written simply as , with the summation over being implicitly understood as long as the indices repeat twice. For pedagogical reasons, we will not follow the Einstein summation convention in these notes.
Example
As a trivial example of the basis representation of a vector, consider any . This vector can be written with respect to the standard basis of as Notice that .
Example
As a non-trivial, yet simple, example of basis representation of a vector, consider the orthonormal basis of , where and - notice that we have swapped the order of the basis constructed earlier in the context of the Gram-Schmidt procedure to maintain orientation. Consider any , we can express this in terms of the basis as for some real constants . To compute this, note that we can write using the standard basis of . The constants are easily computed by taking the appropriate dot products: For instance, the vector can be expressed in terms of the basis as , where We thus see that , a fact that can be easily checked directly.
We will now introduce a useful notion called component maps to collect the components of any with respect to the (not necessarily orthonormal) basis of . Define the component map as follows: Notice how we have collected together the components of as a column vector of size using the component map. It is useful at this point to introduce the following notation: When the choice of basis is evident from the context, we will often write as just . We can thus alternatively the basis representation of any with respect to a general basis of as follows: We will however use the simpler notation , and use the notation only when we want to refer to the components alone as a column vector. We will see shortly that the component map is an example of an isomorphism between the vector spaces and .
Remark
We will often write the component map as , or just , when the vector space and its basis are evident from the context. At times, we will omit altogether and refer to the component map using the following notation: .
Example
As a quick illustration of this, notice that the vector has the following matrix representation with respect to the orthonormal basis , where and of is .
Change of basis rules for vectors
Given an -dimensional inner product space , let us first consider the case where and are two general bases of , not necessarily orthonormal. Then, any can be written as where and are the components of with respect to the bases and , respectively. The fact that is a basis of implies that where for every . Similarly, we have where for every . Notice how the first index of the transformation coefficients pairs with the corresponding basis vector. The reason for this specific choice will become clear shortly.
Combining these two transformation relations, we see that and It is convenient to collect together the constants as a matrix whose entry is . We will similarly collect the constants in a matrix . In matrix notation, we can write the foregoing equations succinctly as where is the identity matrix of order . We thus see that matrices and are inverses of each other.
Given the transformation relations between the two bases, we can use the identity to see that In matrix notation, we can summarize the foregoing result as follows:
Remark
In a more general treatment of linear algebra, the fact that the components of a vector transform in a manner contrary to that of the manner in which the basis vectors transform is used to call elements of as contravariant vectors. We will not develop the general theory here, but make a few elementary remarks occasionally regarding this.
Let us now consider the special case when both and are orthonormal bases of . In this case, we can use the fact that to simplify the calculations. Suppose that It follows immediately that Let us now see how the components of any transform upon this change of basis: In matrix notation, this is written as where is the matrix whose entry is . But, based on the calculation we carried out earlier in the context of general bases, we see that Comparing these two expressions, we are led to the following conclusion: Recall that matrices that satisfy this condition are called orthogonal matrices. It is an easy consequence of orthogonality that the determinant of an orthogonal matrix is , as the following calculation shows: if is an orthogonal matrix If , then the orthogonal matrix is called proper orthogonal, or special orthogonal.
Remark
It is important to note that the foregoing conclusion that the matrix involved in the change of basis is orthogonal is true only in the special case when both bases are orthonormal.
Example
Consider the orthonormal bases and of , where is the standard basis of and , . The transformation matrix from to is computed using the relation as It is easily checked that is orthogonal: It can be similarly checked that .
As a quick check, note that also the determinant of the transformation map in the previous example is : This informs us that the transformation matrix is in fact special orthogonal.
General basis
Most of the discussion thus far regarding the representation of a vector in a finite dimensional inner product space has been restricted to the special case of orthonormal bases. Let us briefly consider the general case when a general basis, which is not necessarily orthonormal, is chosen. In what follows, denotes an inner product space of dimension , and is a general basis of .
Representation of vectors
Any can be written in terms of the basis of as where are the components of with respect to this basis. To compute these components, start with taking the inner product of this equation with the basis vector ; this yields This equation can be written in the form of a matrix equation, as follows: The fact that is a basis of implies that the components exist and are unique. This implies that the matrix introduced above, whose entry is , is invertible. It is conventional, and convenient, to represent the inverse of this matrix as the matrix with entries ; thus, A proper justification for this choice of notation will be given shortly when we study reciprocal bases. The fact that these two matrices are inverses of each other can be written succinctly as follows: Using this result, a trite calculation yields the following result: for any , The components of any vector with respect to a general basis can thus be computed explicitly.
Example
Consider the basis of , where and . Let us now compute the components of with respect to this basis.
The first step in to compute the matrix whose entries are : Notice that this matrix is symmetric, as expected, since , in general. The inverse of this matrix gives the scalars as follows: The components of with respect to the basis are now easily computed using the result as, We thus see that . As a consistency check, substitute the representations of with respect to the standard basis of and verify that this is correct.
Reciprocal basis
The computations presented in the previous section can be greatly simplified by introducing the reciprocal basis corresponding to a given basis. Given a basis of , its reciprocal basis is defined as the basis such that where . Based on the preceding development, it can be seen that the reciprocal basis is explicitly given by the following equations: This can be readily inverted to yield the following equation: Note that in the special case of the standard basis of , . More generally, if is an orthonormal basis of an inner product space , then . This is one of the reasons why many calculations are much simpler when using orthonormal bases.
It follows from the definition of the reciprocal basis of that Thus, the following useful formulate are obtained: if is a general basis of and is its reciprocal basis, then and .
Remark
The use of superscripts here is done purely for notational convenience. It is however possible to justify such a notation when considering a more detailed treatment of this subject, as will be briefly noted later.
Example
Consider the previous example involving the basis of , where and . In this case, the reciprocal basis is computed as follows: It can be checked with a simple calculation that , as expected. Further more, the matrix whose entry is is computed as This confirms that the matrix whose entry is is indeed the inverse of the matrix whose entry is .
Example
The reciprocal basis can be computed easily in the special case of the three dimensional Euclidean space using the cross product. Given any basis of , the corresponding reciprocal basis of can be computed as It is a simple exercise to check these formulae satisfy the defining condition of the reciprocal basis: .
The expressions for the coefficients of a vector with respect to a given basis simple form when expressed in terms of the reciprocal basis. Given any and a basis of , Thus, any has the compact representation Compare this with the representation of with respect to an orthonormal basis of .
Remark
Given any and a basis of , the components of with respect to the and its reciprocal basis are written as follows: The components and are called the contravariant and covariant components of , respectively. In many textbooks, the following alternative notation is used: The components and are related as follows: and . For this reason, and are said to raise and lower, respectively, indices. Since we will largely restrict ourselves to the case of orthonormal bases and rarely represent a vector in terms of the reciprocal basis to a given basis, we will not adopt this more nuanced notation here.
Change of basis rules
The ideas presented so far can be used to express a given vector with respect to different bases. Suppose that has the following representations, with respect to two different bases and of : Taking the inner product of these representations with respect to the appropriate reciprocal basis vectors, it is evident that and a similar formula expressing in terms of . Notice how the use of the reciprocal basis significantly simplifies the computations.
Example
Consider the example considered earlier where the vector was expressed in terms of the basis of , where and . We saw earlier that Let us now consider another basis of , where , . To compute the representation of with respect to the basis , we first need to compute its reciprocal basis . This is easily accomplished as follows: The reciprocal basis is computed using the relations : It is left as an easy exercise to verify that . Using these relations the components of with respect to the basis can be computed using the relations as follows: We thus see get the representation of in the basis as It is left as a simple exercise to verify by direct substitution that this is true.
Finally, note that the components of with respect to the basis can be directly obtained from its components with respect to the basis using the relations as follows: We thus see that all the different ways to compute the components of with respect to two different choices of bases are consistent with each other.