Linearization of Nonlinear Maps

The foregoing sections dealt with, among other things, the theory of linear maps between finite dimensional inner product spaces. A basic introduction to the treatment of nonlinear maps is now presented. Only certain essential ideas that are needed for the later development of tensor analysis are presented here.

Linear approximation of real valued functions of a real variable

To begin with, consider the case of a nonlinear function of the form $f:\mathbb{R} \to\mathbb{R}$ . It assumed that $f$ is differentiable, and that its derivative is continuous. The standard approach to study such nonlinear functions is by locally linearizing them. To understand what this means, consider the tangent $L_{x_0}:\mathbb{R}\to\mathbb{R}$ to $f$ at $x_0 \in \mathbb{R}$ . The equation for the tangent is given by $L_{x_0}(x) = f(x_0) + f'(x_0)(x - x_0).$ Note that the function $L_{x_0}$ is not linear on account of the constant term $f(x_0)$ . Functions of this form that are linear except for an additive constant are said to be affine. The affine function $L_{x_0}$ is said to locally linearize the nonlinear function $f$ . In what follows, this notion is generalized to the case of nonlinear maps between finite dimensional inner product spaces.

Basis representation of nonlinear maps

Let $\mathsf{F}:V \to W$ be a nonlinear map between finite dimensional inner product spaces $V$ and $W$ of dimension $n$ and $m$ , respectively. Let $(\mathsf{g}_i)$ and $(\mathsf{f}_i)$ be bases of $V$ and $W$ , respectively. Define basis maps $\mathsf{\phi}_V:V \to \mathbb{R}^n$ and $\mathsf{\phi}_W:W \to \mathbb{R}^m$ as follows: for any $\mathsf{v} \in V$ and $\mathsf{w} \in W$ , $\begin{split} \mathsf\phi_V(\mathsf{v}) &= \mathsf\phi_V\left(\sum v_i \mathsf{g}_i\right) = (v_1, \ldots, v_n),\\ \mathsf\phi_W(\mathsf{w}) &= \mathsf\phi_W\left(\sum w_i \mathsf{f}_i\right) = (w_1, \ldots, w_m). \end{split}$ The representation of $\mathsf{F}:V \to W$ is defined as the map $\mathsf\phi_W \circ \mathsf{F} \circ \mathsf\phi^{-1}_V:\mathbb{R}^n \to \mathbb{R}^m$ . The following relation readily follows from the definition: $(w_1, \ldots, w_m) = \mathsf{F}_{V,W}(v_1, \ldots, v_n),$ where $\mathsf{F}_{V,W} = \mathsf\phi_W \circ \mathsf{F} \circ \mathsf\phi^{-1}_V$ . Since any nonlinear map between finite dimensional inner product spaces can be represented using a nonlinear map between the corresponding Euclidean spaces using the foregoing technique, it suffices to study nonlinear maps between Euclidean spaces only.

Basic topological notions in $\mathbb{R}^n$

In the following development, it will be necessary to study how various quantities vary as we move from a given point $\mathsf{x} \in U \subseteq \mathbb{R}^n$ to one of its neighboring points. Notice how we call elements of $\mathbb{R}^n$ as points here; the reason for this terminology is that the eventual application of these ideas is in the context of tensor fields. To define precisely what a neighboring point means, it is helpful to introduce a few definitions. An open ball of radius $r$ centered at $\mathsf{x} \in \mathbb{R}^n$ is the set $B_r(\mathsf{x})$ defined as follows: $B_r(\mathsf{x}) = \{\mathsf{y} \in \mathbb{R}^n \,|\, \lVert \mathsf{y} - \mathsf{x} \rVert < r\}.$ $B_r(\mathsf{x})$ thus contains all points $\mathsf{y}$ within a sphere of radius $r$ centered at $\mathsf{x}$ . A set $U \subseteq \mathbb{R}^n$ is said to be open if for every $\mathsf{x} \in U$ , there exists an $r \in \mathbb{R}$ such that $B_r(\mathsf{x}) \subseteq U$ . In a loose sense, every point in an open set is sufficiently inside the set.

The reason for why open sets are so useful in practice is the following: when $U \subseteq \mathbb{R}^n$ is open, every point $\mathsf{x} \in U$ has a neighboring point arbitrarily close to it. Indeed, choosing $r \in \mathbb{R}$ such that $B_r(\mathsf{x}) \subseteq U$ , the point $\mathsf{x} + t(\mathsf{y} - \mathsf{x})$ , where $0 < t < 1$ and $\mathsf{y} \in B_r(\mathsf{x})$ , gets arbitrarily close to $\mathsf{x}$ as $t \to 0$ and still remains within $U$ .

Consider now a nonlinear map of the form $\mathsf{F}:U \to \mathbb{R}^m$ , where $U \subseteq \mathbb{R}^n$ is an open subset of $\mathbb{R}^n$ . The map $\mathsf{F}$ is said to be continuous at $\mathsf{x} \in U$ if for every scalar $\epsilon > 0$ , there exists a scalar $\delta > 0$ such that $\lVert \mathsf{y} - \mathsf{x} \rVert < \delta \quad\Rightarrow\quad \lVert \mathsf{F}(\mathsf{y}) - \mathsf{F}(\mathsf{x}) \rVert < \epsilon.$ What this definition encapsulates is the intuitive idea that if $\mathsf{F}$ is continuous at $\mathsf{x}$ , then $\mathsf{F}(\mathsf{y})$ gets closer and closer to $\mathsf{F}(\mathsf{x})$ as $\mathsf{y}$ gets closer and closer to $\mathsf{x}$ . The map $\mathsf{F}$ is said to be continuous on $U$ if it is continuous at every $\mathsf{x} \in U$ . It is customary to denote the set of all continuous maps from $U$ to $\mathbb{R}^m$ as $C^0(U,\mathbb{R}^m)$ .

Differentiability of nonlinear maps

Let $\mathsf{F}:U \subseteq\mathbb{R}^n \to \mathbb{R}^m$ be a nonlinear map from an open subset $U$ of $\mathbb{R}^n$ into $\mathbb{R}^m$ , as before. The nonlinear map $\mathsf{F}$ is said to be differentiable at $\mathsf{x} \in U$ if there exists a linear map $D_{\mathsf{x}}\mathsf{F}:\mathbb{R}^n \to \mathbb{R}^m$ such that, for any $\mathsf{h} \in \mathbb{R}^n$ , $\lim_{\lVert\mathsf{h}\rVert \to 0} \frac{\lVert \mathsf{F}(\mathsf{x} + \mathsf{h}) - \mathsf{F}(\mathsf{x}) - D_{\mathsf{x}}\mathsf{F}(\mathsf{h}) \rVert}{\lVert \mathsf{h} \rVert} = 0,$ The linear map $D_{\mathsf{x}}\mathsf{F}$ is called the Fréchet derivative of $\mathsf{F}$ at $\mathsf{x}$ . If $\mathsf{F}$ is differentiable at every $\mathsf{x} \in U$ then $\mathsf{F}$ is said to be differentiable on $U$ . The set of all differentiable maps from $U \subseteq \mathbb{R}^n$ into $\mathbb{R}^m$ is notated as $C^1(U,\mathbb{R}^m)$ .

Remark

It can be shown that $C^1(U,\mathbb{R}^m) \subset C^0(U,\mathbb{R}^m)$ : every differentiable map is also continuous. The converse is not true.

If $\mathsf{F}:U \subseteq \mathbb{R}^n \to \mathbb{R}^m$ is differentiable on $U$ , then it is convenient to introduce the map $D\mathsf{F}:U \to L(\mathbb{R}^n,\mathbb{R}^m)$ as follows: for any $\mathsf{x} \in U$ , $D\mathsf{F}(\mathsf{x}) = D_{\mathsf{x}}\mathsf{F}.$ Note that $D\mathsf{F}$ is, in general, a nonlinear map.

Remark

Note that given a linear map $\mathsf{T}:\mathbb{R}^n \to \mathbb{R}^m$ , it is an easy consequence of the definition of the Fréchet derivative that $D\mathsf{T} = \mathsf{T}$ .

Remark

Since $L(\mathbb{R}^n,\mathbb{R}^m)$ is itself a linear space, it is possible, therefore, to extend the notion of differentiability to $D\mathsf{F}$ , and define the Fréchet derivative $D^2\mathsf{F}:U \to L(\mathbb{R}^n,L(\mathbb{R}^n,\mathbb{R}^m))$ of $D\mathsf{F}$ as before, after defining a suitable inner product on $L(\mathbb{R}^n,\mathbb{R}^m)$ . In this case, $\mathsf{F}$ is said to be twice differentiable, and it is conventional to denote the set of all such twice differentiable maps as $C^2(U,\mathbb{R}^m)$ . Higher order derivatives of $\mathsf{F}$ are defined analogously. If the Fréchet derivative of $\mathsf{F}$ of any order exists, then $\mathsf{F}$ is said to be a smooth nonlinear map. The set of all smooth maps from $U\subseteq\mathbb{R}^n$ into $\mathbb{R}^m$ is denoted as $C^\infty(U,\mathbb{R}^m)$ . All maps considered henceforth will be assumed to be smooth unless stated otherwise.

The Fréchet derivative $D\mathsf{F}$ of the nonlinear map $\mathsf{F}$ provides a locally linear approximation of $\mathsf{F}$ . The linearization of $\mathsf{F}$ at $\mathsf{x}_0 \in U$ is defined as the map $\mathsf{L}_{\mathsf{x}_0}:\mathbb{R}^n \to \mathbb{R}^m$ defined as $\mathsf{L}_{\mathsf{x}_0}(\mathsf{x}) = \mathsf{F}(\mathsf{x}_0) + D_{\mathsf{x}_0}\mathsf{F}\,(\mathsf{x} - \mathsf{x}_0),$ where $\mathsf{x} \in \mathbb{R}^n$ . Notice how this expression generalizes the tangent line to a real valued function of a real variable discussed earlier.

To compute the component representation of the Fréchet derivative $D_{\mathsf{x}_0}\mathsf{F} \in L(\mathbb{R}^n,\mathbb{R}^m)$ at $\mathsf{x}_0 \in U \subseteq \mathbb{R}^n$ with respect to the standard bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ , note that $\begin{split} [D_{\mathsf{x}_0}\mathsf{F}]_{ij} &= \mathsf{e}_i \cdot D_{\mathsf{x}_0}\mathsf{F} (\mathsf{e}_j)\\ &= \partial_j F_i(\mathsf{x}_0). \end{split}$ Here, $\mathsf{F} = (F_1, \ldots, F_m)$ , where each $F_i:U \to \mathbb{R}$ is a real valued function of $n$ variables, and $\partial_j F_i(\mathsf{x}_0)$ denote the $j^{\text{th}}$ partial derivative of $F_i$ evaluated at $\mathsf{x}_0$ . The matrix with components $\partial_j F_i(\mathsf{x}_0)$ is called the Jacobian matrix of $\mathsf{F}$ at $\mathsf{x}_0$ .

The foregoing argument also shows that the basis representation of $D_{\mathsf{x}_0}\mathsf{F}$ with respect to the standard bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ is $D_{\mathsf{x}_0}\mathsf{F} = \sum \partial_j F_i(\mathsf{x}_0) \mathsf{e}_i \otimes \mathsf{e}_j.$ The representation of $D_{\mathsf{x}_0}\mathsf{F}$ with respect to arbitrary bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ can be computed analogously, but is omitted here in the interest of keeping the development simple.

Properties of the Fréchet derivative

Two important properties of the Fréchet derivative of nonlinear maps are now discussed briefly. The first is known as the chain rule of differentiation. Given differentiable maps $\mathsf{F}:\mathbb{R}^m \to \mathbb{R}^k$ and $\mathsf{G}:\mathbb{R}^n \to \mathbb{R}^m$ , it can be shown that $D(\mathsf{F} \circ \mathsf{G}) = (D\mathsf{F} \circ \mathsf{G}) \circ D\mathsf{G}.$ This is a generalization of the chain rule of differentiation in single variable calculus. This result can be established easily by working in the component representation of the Fréchet derivatives. Choosing the standard bases of all the Euclidean spaces involved, it can be shown that $(D_{\mathsf{x}_0}(\mathsf{F} \circ \mathsf{G}))_{ij} = \sum \partial_k F_i(\mathsf{G}(\mathsf{x}_0)) \, \partial_j G_k(\mathsf{x}_0),$ for any $\mathsf{x}_0 \in \mathbb{R}^n$ .

The second property of the Fréchet derivative that is useful in applications relates to the local invertibility of nonlinear maps. Let $\mathsf{F}:U \subseteq \mathbb{R}^n \to \mathbb{R}^n$ be a given differentiable nonlinear map. If $D_{\mathsf{x}_0}\mathsf{F}:\mathbb{R}^n \to \mathbb{R}^n$ is invertible, where $\mathsf{x}_0 \in U$ , then the inverse function theorem states that the map $\mathsf{F}$ is locally invertible at $\mathsf{F}(\mathsf{x}_0) \in \mathbb{R}^n$ , and further that $D_{\mathsf{F}(\mathsf{x}_0)}\mathsf{F}^{-1} = (D_{\mathsf{x}_0}\mathsf{F})^{-1}.$ The proof of this theorem is non-trivial, and can be found in any good book on multivariable calculus.

Directional derivatives

Given a nonlinear map $\mathsf{F}:U \subseteq \mathbb{R}^n \to \mathbb{R}^m$ , where $U$ is open in $\mathbb{R}^n$ , it is convenient to define a weaker notion of a derivative than the Fréchet derivative called the directional derivative of $\mathsf{F}$ . The basic idea is to study how $\mathsf{F}$ varies along a particular direction. The map $\mathsf{F}:U \to \mathbb{R}^m$ is said to be Gâteaux differentiable at $\mathsf{x}_0 \in U$ if there exists a map $\delta_{\mathsf{x}_0} \mathsf{F}:\mathbb{R}^n \to \mathbb{R}^m$ such that $\delta_{\mathsf{x}_0} \mathsf{F}(\mathsf{h}) = \lim_{t \to 0} \frac{\mathsf{F}(\mathsf{x}_0 + t\mathsf{h}) - \mathsf{F}(\mathsf{x}_0)}{t} = \frac{d}{dt}\bigg\vert_{t=0} \mathsf{F}(\mathsf{x}_0 + t\mathsf{h}),$ for any $\mathsf{h} \in \mathbb{R}^n$ . The map $\delta_{\mathsf{x}_0}\mathsf{F}$ is also called the directional derivative of $\mathsf{F}$ at $\mathsf{x}_0$ along the direction $\mathsf{h}$ . If $\mathsf{F}$ is Gâteaux differentiable at every $\mathsf{x}_0 \in U$ , then the Gâteaux differential $\delta \mathsf{F}:U \times \mathbb{R}^n \to \mathbb{R}^m$ of $\mathsf{F}$ is defined as $\delta \mathsf{F}(\mathsf{x}_0, \mathsf{h}) = \frac{d}{dt}\bigg\vert_{t=0} \mathsf{F}(\mathsf{x}_0 + t\mathsf{h}),$ where $\mathsf{h} \in \mathbb{R}^n$ .

Remark

It is to be emphasized that $\delta_{\mathsf{x}_0} \mathsf{F}:\mathbb{R}^n \to \mathbb{R}^m$ is not necessarily a linear map. If it turns out that the Gâteaux differential $\delta \mathsf{F}:U \times \mathbb{R}^n \to \mathbb{R}^m$ is linear in its second argument, it is usually written using one of the following equivalent notations: $\delta \mathsf{F}(\mathsf{x}_0, \mathsf{h}) = \frac{\delta \mathsf{F}}{\delta \mathsf{x}}\bigg\vert_{\mathsf{x}_0}\mathsf{h} = \partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0) \mathsf{h},$ where $\mathsf{x}_0 \in U$ and $\mathsf{h} \in \mathbb{R}^n$ . The latter notation will be adopted more frequently in these notes. The linear map $\partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0):\mathbb{R}^n \to \mathbb{R}^m$ is called the Gâteaux derivative, or the functional derivative, of $\mathsf{F}$ at $\mathsf{x}_0$ .

If the map $\mathsf{F}:U \subseteq \mathbb{R}^n \to \mathbb{R}^m$ is Fréchet differentiable, then it is necessarily Gâteaux differentiable on $U$ . Further, in this case, the Gâteaux differential is linear in its second argument, and $(D_{\mathsf{x}_0}\mathsf{F})\mathsf{h} = \delta \mathsf{F}(\mathsf{x}_0, \mathsf{h}) = \partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0) \mathsf{h},$ for any $\mathsf{x}_0 \in U$ and $\mathsf{h} \in \mathbb{R}^n$ . This shows, in particular, that under the assumption of Fréchet differentiability, the Gâteaux differential can be used to compute the Fréchet derivative.

Remark

The converse is not true: the existence of the Gâteaux differential does not imply Fréchet differntiability, except when the Gâteaux differential satisfies additional conditions. In these notes, all maps between vector spaces are assumed to be Fréchet differentiable, unless stated otherwise.

Gradient of nonlinear maps

We will now introduce an important notion called the gradient of a nonlinear map. Suppose that $\mathsf{F}:V \to W$ is a nonlinear map between finite dimensional inner product spaces as before that is differentiable. In this case, the Fréchet derivative of $\mathsf{F}$ is equal to its Gâteaux derivative, as we just discussed. To introduce the notion of the gradient of $\mathsf{F}$ , it is helpful to first introduce the notion of tensor product spaces -- given finite dimensional inner product spaces $V, W$ , the tensor product space $W \otimes V$ is defined as follows: $W \otimes V = \{\mathsf{T}:W \times V \to \mathbb{R} \,|\, \mathsf{T} \text{ is a multilinear map}\}.$

Remark

The tensor product space introduced above is more properly written as $W^* \otimes V^*$ , where $W^*$ and $V^*$ are the (algebraic) dual spaces of $W$ and $V$, respectively. The algebraic dual of a vector space is just the vector space consisting of linear functions defined on the vector space. When dealing with finite dimensional inner product spaces we do not need to distinguish between the vector space and its dual since there is a canonical identification furnished by the metric. This identification is exploited here to arrive at a simpler notation.

The gradient of $\mathsf{F}$ at $\mathsf{x}_0 \in V$ is defined as the tensor $\text{grad }\mathsf{F}(\mathsf{x}_0) \in W \otimes V$ such that, for any $\mathsf{h} \in V$ , $\text{grad }\mathsf{F}(\mathsf{x}_0) \cdot \mathsf{h} = \partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0)\mathsf{h}.$ Note that the inner product on the left is understood in the sense of the generalized inner product of tensors as we discussed earlier.

Remark

It is conventional to denote the gradient of $\mathsf{F}$ using the same symbol as the Gâteaux derivative of $F$ . Thus, we will often write $\partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0)$ to denote the gradient of $\mathsf{F}$ at $\mathsf{x}_0$ , $\text{grad }\mathsf{F}(\mathsf{x}_0)$ . The meaning of a term like $\partial_{\mathsf{x}} \mathsf{F}(\mathsf{x}_0)$ should thus be carefully interpreted depending on the context.