Linearization of Nonlinear Maps

The foregoing sections dealt with, among other things, the theory of linear maps between finite dimensional inner product spaces. A basic introduction to the treatment of nonlinear maps is now presented. Only certain essential ideas that are needed for the later development of tensor analysis are presented here.

Linear approximation of real valued functions of a real variable

To begin with, consider the case of a nonlinear function of the form . It assumed that is differentiable, and that its derivative is continuous. The standard approach to study such nonlinear functions is by locally linearizing them. To understand what this means, consider the tangent to at . The equation for the tangent is given by Note that the function is not linear on account of the constant term . Functions of this form that are linear except for an additive constant are said to be affine. The affine function is said to locally linearize the nonlinear function . In what follows, this notion is generalized to the case of nonlinear maps between finite dimensional inner product spaces.

Basis representation of nonlinear maps

Let be a nonlinear map between finite dimensional inner product spaces and of dimension and , respectively. Let and be bases of and , respectively. Define basis maps and as follows: for any and , The representation of is defined as the map . The following relation readily follows from the definition: where . Since any nonlinear map between finite dimensional inner product spaces can be represented using a nonlinear map between the corresponding Euclidean spaces using the foregoing technique, it suffices to study nonlinear maps between Euclidean spaces only.

Basic topological notions in $\mathbb{R}^n$

In the following development, it will be necessary to study how various quantities vary as we move from a given point to one of its neighboring points. Notice how we call elements of as points here; the reason for this terminology is that the eventual application of these ideas is in the context of tensor fields. To define precisely what a neighboring point means, it is helpful to introduce a few definitions. An open ball of radius centered at is the set defined as follows: thus contains all points within a sphere of radius centered at . A set is said to be open if for every , there exists an such that . In a loose sense, every point in an open set is sufficiently inside the set.

The reason for why open sets are so useful in practice is the following: when is open, every point has a neighboring point arbitrarily close to it. Indeed, choosing such that , the point , where and , gets arbitrarily close to as and still remains within .

Consider now a nonlinear map of the form , where is an open subset of . The map is said to be continuous at if for every scalar , there exists a scalar such that What this definition encapsulates is the intuitive idea that if is continuous at , then gets closer and closer to as gets closer and closer to . The map is said to be continuous on if it is continuous at every . It is customary to denote the set of all continuous maps from to as .

Differentiability of nonlinear maps

Let be a nonlinear map from an open subset of into , as before. The nonlinear map is said to be differentiable at if there exists a linear map such that, for any , The linear map is called the Fréchet derivative of at . If is differentiable at every then is said to be differentiable on . The set of all differentiable maps from into is notated as .

Remark

It can be shown that : every differentiable map is also continuous. The converse is not true.

If is differentiable on , then it is convenient to introduce the map as follows: for any , Note that is, in general, a nonlinear map.

Remark

Note that given a linear map , it is an easy consequence of the definition of the Fréchet derivative that .

Remark

Since is itself a linear space, it is possible, therefore, to extend the notion of differentiability to , and define the Fréchet derivative of as before, after defining a suitable inner product on . In this case, is said to be twice differentiable, and it is conventional to denote the set of all such twice differentiable maps as . Higher order derivatives of are defined analogously. If the Fréchet derivative of of any order exists, then is said to be a smooth nonlinear map. The set of all smooth maps from into is denoted as . All maps considered henceforth will be assumed to be smooth unless stated otherwise.

The Fréchet derivative of the nonlinear map provides a locally linear approximation of . The linearization of at is defined as the map defined as where . Notice how this expression generalizes the tangent line to a real valued function of a real variable discussed earlier.

To compute the component representation of the Fréchet derivative at with respect to the standard bases of and , note that Here, , where each is a real valued function of variables, and denote the partial derivative of evaluated at . The matrix with components is called the Jacobian matrix of at .

The foregoing argument also shows that the basis representation of with respect to the standard bases of and is The representation of with respect to arbitrary bases of and can be computed analogously, but is omitted here in the interest of keeping the development simple.

Properties of the Fréchet derivative

Two important properties of the Fréchet derivative of nonlinear maps are now discussed briefly. The first is known as the chain rule of differentiation. Given differentiable maps and , it can be shown that This is a generalization of the chain rule of differentiation in single variable calculus. This result can be established easily by working in the component representation of the Fréchet derivatives. Choosing the standard bases of all the Euclidean spaces involved, it can be shown that for any .

The second property of the Fréchet derivative that is useful in applications relates to the local invertibility of nonlinear maps. Let be a given differentiable nonlinear map. If is invertible, where , then the inverse function theorem states that the map is locally invertible at , and further that The proof of this theorem is non-trivial, and can be found in any good book on multivariable calculus.

Directional derivatives

Given a nonlinear map , where is open in , it is convenient to define a weaker notion of a derivative than the Fréchet derivative called the directional derivative of . The basic idea is to study how varies along a particular direction. The map is said to be Gâteaux differentiable at if there exists a map such that for any . The map is also called the directional derivative of at along the direction . If is Gâteaux differentiable at every , then the Gâteaux differential of is defined as where .

Remark

It is to be emphasized that is not necessarily a linear map. If it turns out that the Gâteaux differential is linear in its second argument, it is usually written using one of the following equivalent notations: where and . The latter notation will be adopted more frequently in these notes. The linear map is called the Gâteaux derivative, or the functional derivative, of at .

If the map is Fréchet differentiable, then it is necessarily Gâteaux differentiable on . Further, in this case, the Gâteaux differential is linear in its second argument, and for any and . This shows, in particular, that under the assumption of Fréchet differentiability, the Gâteaux differential can be used to compute the Fréchet derivative.

Remark

The converse is not true: the existence of the Gâteaux differential does not imply Fréchet differntiability, except when the Gâteaux differential satisfies additional conditions. In these notes, all maps between vector spaces are assumed to be Fréchet differentiable, unless stated otherwise.

Gradient of nonlinear maps

We will now introduce an important notion called the gradient of a nonlinear map. Suppose that is a nonlinear map between finite dimensional inner product spaces as before that is differentiable. In this case, the Fréchet derivative of is equal to its Gâteaux derivative, as we just discussed. To introduce the notion of the gradient of , it is helpful to first introduce the notion of tensor product spaces -- given finite dimensional inner product spaces , the tensor product space is defined as follows:

Remark

The tensor product space introduced above is more properly written as , where and are the (algebraic) dual spaces of $W$ and $V$, respectively. The algebraic dual of a vector space is just the vector space consisting of linear functions defined on the vector space. When dealing with finite dimensional inner product spaces we do not need to distinguish between the vector space and its dual since there is a canonical identification furnished by the metric. This identification is exploited here to arrive at a simpler notation.

The gradient of at is defined as the tensor such that, for any , Note that the inner product on the left is understood in the sense of the generalized inner product of tensors as we discussed earlier.

Remark

It is conventional to denote the gradient of using the same symbol as the Gâteaux derivative of . Thus, we will often write to denote the gradient of at , . The meaning of a term like should thus be carefully interpreted depending on the context.