Differentiation of more general maps \(\mathbb{R}^n \to \mathbb{R}^m\)

 We now consider the differentiation of more general maps \(\mathbb{R}^n \to \mathbb{R}^m\). We can import many results from the case of bivariate functions, \(\mathbb{R}^2 \to \mathbb{R}\).



Multivariate functions \(\mathbb{R}^n \to \mathbb{R}\)

First, we consider the general multivariate function \(y = f(x): \mathbb{R}^n \to \mathbb{R}\) where \(x = (x_1, x_2, \cdots, x_n) \in\mathbb{R}^n\).

Definition (Total differentiability)

Let \(U\) be an open region in \(\mathbb{R}^n\). For the function \(f(x)\) on \(U\) and \(a = (a_1, a_2, \cdots, a_n)\in U\), \(f(x)\) is said to be (totally) differentiable at \(a\) if there exist constants \(m_1, m_2, \cdots, m_n\) such that

\[f(x) = f(a) + m_1(x_1 - a_1) + m_2(x_2 - a_2) +\cdots + m_n(x_n - a_n) + o(\|x-a\|)\]

as \(x \to a\). \(f(x)\) is said to be totally differentiable on \(U\) if \(f(x)\) is totally differentiable at all points in \(U\).

When \(n = 2\), this definition matches the definition of total differentiability of two-variable functions. 

See also: Partial and total differentiation of multivariate functions

When \(n = 1\), this definition matches the definition of differentiability of univariate (one-variable) functions. Similarly to the cases of \(n =1\) and \(n = 2\), if an \(n\)-variable function \(f(x) = f(x_1, x_2, \cdots, x_n)\) is totally differentiable at \(a = (a_1, a_2, \cdots, a_n)\), then

\[m_i = \frac{\partial f}{\partial x_i}(a), ~ i = 1, 2, \cdots, n.\]

Theorem 1 (Total differentiability implies continuity)

If the function \(f(x) = f(x_1, x_2, \cdots, x_n)\) is totally differentiable at \(a = (a_1, a_2, \cdots, a_n)\), then it is continuous at \(x = a\).

Proof. "Trivial." (Similar to the case of \(n=2\).) ■

See also: Total differentiability implies continuity.

Theorem 2 (Criterion of total differentiability)

Let \(U\) be an open region in \(\mathbb{R}^n\) and \(f(x) = f(x_1, x_2, \cdots, x_n)\) be a function on \(U\) and \(a = (a_1, a_2, \cdots, a_n) \in U\). If all the partial derivatives \(\frac{\partial f}{\partial x_i}(x) ~ (i = 1, 2, \cdots, n)\) exist and they are continuous at \(x = a\), then \(f(x)\) is totally differentiable at \(x = a\).

Proof. "Trivial." (Similar to the case of \(n=2\).) ■

See also: Total differentiability implies continuity

Definition (Functions of class \(C^1\))

The function \(f(x) = f(x_1, x_2, \cdots, x_n)\) on \(U\) is said to be once continuously differentiable or of class \(C^1\) if it has all the derivatives \(\frac{\partial f}{\partial x_i}(x) ~ (i = 1, 2, \cdots, n)\) and they are continuous on \(U\). 

Remark. By the above Theorem 2, functions of class \(C^1\) are totally differentiable, and, by Theorem 1, they are continuous. □

Similarly to the case with \(n = 2\), we may define higher order partial derivatives such as

\[f_{x_ix_j}(x) = \frac{\partial^2 f}{\partial x_j\partial x_i}(x)\]

where \(i, j = 1, 2, \cdots, n\).

See also: Higher-order partial differentiation.

Theorem 3 (Changing the order of partial differentiation)

Suppose that the function \(f(x) = f(x_1, x_2, \cdots, x_n)\) on an open region \(U\) has second partial derivatives \(f_{x_ix_j}(x)\) and \(f_{x_jx_i}(x)\) \( ~ (i,j = 1, 2, \cdots, n)\) which are continuous. Then \(f_{x_ix_j}(x) = f_{x_jx_i}(x)\).

Proof. All variables other than \(x_i\) and \(x_j\) may be regarded as constants in the derivatives \(f_{x_ix_j}(x)\) and \(f_{x_jx_i}(x)\). Then the proof is reduced to the case with \(n = 2\). ■

See also: Higher-order partial differentiation

Definition (Functions of class \(C^r\))

Let \(f(x) = f(x_1, x_2, \cdots, x_n)\) be a function on an open region \(U\) and \(r\) be a non-negative integer.

  1. \(f(x)\) is said to be \(r\)-times continuously differentiable or of class \(C^r\) if it has all the derivatives up to the \(r\)-th order which are continuous on \(U\).
  2. \(f(x)\) is said to be infinitely differentiable or smooth or of class \(C^{\infty}\) if \(f(x)\) has derivatives of all orders which are continuous.

For example, if \(f(x)\) is of class \(C^0\) on \(U\), then \(f(x)\) is continuous on \(U\). As is the case of \(n = 2\), (up to) the \(r\)-th derivatives of a function of class \(C^r\) are determined by the number of differentiation by each variable \(x_i\) \((i = 1, 2, \cdots, n)\) and independent of the order of differentiations.

\(C^r\) maps \(\mathbb{R}^n \to \mathbb{R}^m\)

Now, we consider general maps: \(\mathbb{R}^n \to \mathbb{R}^m\).

Definition (\(C^r\) maps)

Let \(U\) be an open region in \(\mathbb{R}^n\). Let
\[F(x) = (f_1(x), f_2(x), \cdots, f_m(x)), ~ x=(x_1, x_2, \cdots, x_n),\]
be a map from \(U\) to \(\mathbb{R}^m\) (i.e., \(F: U \to \mathbb{R}^m\)). Then, for each \(k = 1, 2, \cdots, m\), \(f_k(x) = f_k(x_1, x_2, \cdots, x_n)\) is a function on \(U\) (i.e., \(f_k: U \to \mathbb{R}\), \(k = 1, 2, \cdots, m\)). If all \(f_k\) are functions of class \(C^r\), then the map \(F(x)\) is said to be of class \(C^r\).

Composite maps

Let \(U\) and \(V\) be open regions in \(\mathbb{R}^n\) and \(\mathbb{R}^m\), respectively. Consider the maps \(F: U \to \mathbb{R}^m\) and \(G: V \to \mathbb{R}^l\),
\[\begin{eqnarray} F(x) &=& (f_1(x), f_2(x), \cdots, f_m(x)), ~ x = (x_1, x_2, \cdots, x_n),\\ G(y) &=& (g_1(y), g_2(y), \cdots, g_l(y)), ~ y = (y_1, y_2, \cdots, y_m). \end{eqnarray}\]
Suppose that \(F(U) \subset V\). Recall that \(F(U)\) is the image of \(U\) by \(F\):
\[F(U) = \{F(x) \mid x \in U\}.\]
Then we can define the composite map \(G\circ F: U \to \mathbb{R}^l\) by
\[(G\circ F)(x) = (h_1(x), h_2(x), \cdots, h_l(x))\]
where
\[h_k(x) = g_k(f_1(x), f_2(x), \cdots, f_m(x)), k = 1, 2, \cdots, l.\]

Of the map \(F(x)\), each of the \(m\) components, \(f_1(x),f_2(x),\cdots, f_m(x)\), is a function of \(n\) independent variables \(x_1, x_2,\cdots, x_n\). Thus, \(F(x)\) has \(mn\) derivatives

\[\frac{\partial f_j}{\partial x_i}(x), ~ x = (x_1, x_2,\cdots,x_n); i = 1, \cdots, n; j = 1, \cdots, m.\]

Of the map \(G(y)\), each of the \(l\) components, \(g_1(y), g_2(y), \cdots, g_l(y)\), is a function of \(m\) independent variables, \(y_1, y_2,\cdots, y_m\). Thus, \(G(x)\) has \(lm\) derivatives

\[\frac{\partial g_k}{\partial y_j}(y), ~ y = (y_1, y_2,\cdots, y_m); j=1,\cdots,m; k=1,\cdots, l.\]

Accordingly, of the composite map \((G\circ F)(x)\), each component \(h_k(x)\) is a function of \(n\) independent variables \(x_1, x_2, \cdots, x_n\). Thus, it has \(ln\) derivatives

\[\frac{\partial h_k}{\partial x_i}(x), ~ x = (x_1, x_2,\cdots, x_n); i=1,\cdots, n; k=1,\cdots,l.\]

Combining these results, we have the chain rule for general maps:

Theorem (Chain rule)

Let \(F\) and \(G\) be maps of class \(C^1\). Then, their composite \(G\circ F\) is also of class \(C^1\), and for all \(k = 1, 2, \cdots, l\) and \(i = 1, 2, \cdots, n\), the following equation holds:

\[\frac{\partial h_k}{\partial x_i}(x) = \sum_{j=1}^{m}\frac{\partial g_k}{\partial y_j}(F(x))\frac{\partial f_j}{\partial x_i}(x).\tag{Eq:Chain}\]

Proof. For each \(k = 1, 2, \cdots, l\), if we consider only \(h_k(x)\) (one \(k\) at a time), then it suffices to consider the case when \(l = 1\). When considering a derivative with respect to each \(x_i\), we may assume other independent variables are constant so that it suffices to consider the case where \(n = 1\). Thus, the problem is reduced to the case where \(z = h(y_1, y_2, \cdots, y_m)\) and \(y_j = g_j(x)\) (\(x\in \mathbb{R}\)) are composed. \(m = 2\) is the bivariate case. The case with general \(m\) can be proved similarly. (See also: Multivariate chain rules.)

Lastly, by (Eq:Chain), the partial derivative \(\frac{\partial h_k}{\partial x_i}(x)\) is continuous (the sum and product of continuous functions are continuous). Therefore, \(G\circ F\) is of class \(C^1\). 

Remark. If we write

\[y_j = f_j(x_1, x_2, \cdots, x_n), ~ (j = 1, 2, \cdots, m),\]

and

\[z_k = g_k(y_1, y_2, \cdots, y_m), ~ (k = 1, 2, \cdots, l),\]

then (Eq:Chain) can be written as

\[\begin{eqnarray} \frac{\partial z_k}{\partial x_i} &=& \sum_{j=1}^{m}\frac{\partial z_k}{\partial y_j}\cdot\frac{\partial y_j}{\partial x_i}\\ &=& \frac{\partial z_k}{\partial y_1}\cdot\frac{\partial y_1}{\partial x_i} + \frac{\partial z_k}{\partial y_2}\cdot\frac{\partial y_2}{\partial x_i} + \cdots + \frac{\partial z_k}{\partial y_m}\cdot\frac{\partial y_m}{\partial x_i}.\tag{Eq:Chain2} \end{eqnarray}\]

Definition (Jacobian)

Let \(U \subset \mathbb{R}^n\) be an open region. For the map \(F(x) = (f_1(x), f_2(x), \cdots, f_m(x)): U \to \mathbb{R}^m\), we can define a matrix whose \((i,j)\)-element is \(\frac{\partial f_i}{\partial x_j}(a)\) where \(a = (a_1, a_2,\cdots, a_n)\in U\):

\[\begin{equation} J_F(a) = \left(\frac{\partial f_i}{\partial x_j}(a)\right) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1}(a) & \frac{\partial f_1}{\partial x_2}(a) & \cdots & \frac{\partial f_1}{\partial x_n}(a) \\ \frac{\partial f_2}{\partial x_1}(a) & \frac{\partial f_2}{\partial x_2}(a) & \cdots & \frac{\partial f_2}{\partial x_n}(a) \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_m}{\partial x_1}(a) & \frac{\partial f_m}{\partial x_2}(a) & \cdots & \frac{\partial f_m}{\partial x_n}(a) \end{pmatrix}. \end{equation}\]

This matrix is called the Jacobian matrix, or simply, Jacobian, of the map \(F(x)\) at \(x = a\).

Consider the case with \(m = 1\). For the function \(f(x) = f(x_1, x_2, \cdots, x_n)\), the Jacobian is a row vector

\[J_f(a) = (f_{x_1}(a), f_{x_2}(a), \cdots, f_{x_n}(a)).\]

This vector defines a linear function on the \(n\)-dimensional vector space: \(\mathbb{R}^n \to \mathbb{R}\),

\[v = \begin{pmatrix} v_1\\ v_2\\ \vdots\\ v_n \end{pmatrix} \mapsto J_{f}(a)v = f_{x_1}(a)v_1 + f_{x_2}(a)v_2 + \cdots + f_{x_n}(a)v_n.\]

This function gives the linear (first-order) term in the asymptotic expansion:

\[f(x) = f(a) + \{f_{x_1}(a)v_1 + f_{x_2}(a)v_2 + \cdots + f_{x_n}(a)v_n\} + o(\|x-a\|))\]

where \(v_1 = x_1 - a_1, v_2 = x_2 - a_2, \cdots, v_n = x_n - a_n\).

This idea can be extended to the case with general \(m\). For the map \(F: U \to \mathbb{R}^m\), its Jacobian \(J_F(a)\) induces the linear approximation of \(F(x)\) at \(x = a\).

Example. Let \(x = 2u - v, y = 4u + 3v\). Then

\[\begin{pmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \\ \end{pmatrix} = \begin{pmatrix} 2 & - 1\\ 4 & 3 \end{pmatrix}.\]

Let us restate the chain rule in terms of Jacobians.

The derivative of the composite \(G\circ F\) is given by (Eq:Chain). In terms of Jacobians, we have

\[\begin{eqnarray*} && \begin{pmatrix} \frac{\partial h_1}{\partial x_1}(a) & \frac{\partial h_1}{\partial x_2}(a) & \cdots & \frac{\partial h_1}{\partial x_n}(a)\\ \frac{\partial h_2}{\partial x_1}(a) & \frac{\partial h_2}{\partial x_2}(a) & \cdots & \frac{\partial h_2}{\partial x_n}(a)\\ \vdots & \vdots & & \vdots\\ \frac{\partial h_l}{\partial x_1}(a) & \frac{\partial h_l}{\partial x_2}(a) & \cdots & \frac{\partial h_l}{\partial x_n}(a) \end{pmatrix}\\ &=& \begin{pmatrix} \frac{\partial g_1}{\partial y_1}(F(a)) & \frac{\partial g_1}{\partial y_2}(F(a)) & \cdots & \frac{\partial g_1}{\partial y_m}(F(a))\\ \frac{\partial g_2}{\partial y_1}(F(a)) & \frac{\partial g_2}{\partial y_2}(F(a)) & \cdots & \frac{\partial g_2}{\partial y_m}(F(a))\\ \vdots & \vdots & & \vdots\\ \frac{\partial g_l}{\partial y_1}(F(a)) & \frac{\partial g_l}{\partial y_2}(F(a)) & \cdots & \frac{\partial g_l}{\partial y_m}(F(a)) \end{pmatrix} \begin{pmatrix} \frac{\partial f_1}{\partial x_1}(a) & \frac{\partial f_1}{\partial x_2}(a) & \cdots & \frac{\partial f_1}{\partial x_n}(a)\\ \frac{\partial f_2}{\partial x_1}(a) & \frac{\partial f_2}{\partial x_2}(a) & \cdots & \frac{\partial f_2}{\partial x_n}(a)\\ \vdots & \vdots & & \vdots\\ \frac{\partial f_m}{\partial x_1}(a) & \frac{\partial f_m}{\partial x_2}(a) & \cdots & \frac{\partial f_m}{\partial x_n}(a) \end{pmatrix}, \end{eqnarray*}\]

or

\[J_{G\circ F}(a) = J_{G}(F(a))J_{F}(a).\tag{Eq:MatChain}\]

(After you learn more linear algebra, you will understand the following...)

A matrix represents a linear map. The product of matrices corresponds to the composition of the corresponding linear maps. (Eq:MatChain) indicates that the linear approximation (\(J_{G\circ F}(a)\)) of the composition of maps is equal to the composition of the linear approximations (\(J_{G}(F(a))\) and \(J_F(a)\)) of the maps. In short,

The linear approximation of the composition of maps is the composition of the linear approximations of the maps.

In short, composition and linear approximation are commutative.


Comments

Popular posts from this blog

Open sets and closed sets in \(\mathbb{R}^n\)

Euclidean spaces

Newton's method