# Abstract Nonsense

## The Chain Rule

Point of Post: In this post we discuss the chain rule of total derivatives which generalizes the normal chain rule.

$\text{ }$

Motivation

$\text{ }$

If the total derivative is the generalization of the normal derivative for functions $\mathbb{R}\to\mathbb{R}$ we’ve made it out to be one would hope that it shares most of the nice attributes of the regular derivative. In particular, one of the nicest properties of the normal derivatives for real valued real functions is the chain rule. Here we prove that an analogous theorem holds for differentiable mappings $\mathbb{R}^n\to\mathbb{R}^m$.

$\text{ }$

Chain Rule

$\text{ }$

Our goal is to show that if $f:\mathbb{R}^n\to\mathbb{R}^m$ and $g:\mathbb{R}^m\to\mathbb{R}^p$ are maps and they share a ‘common point’ of differentiability then their composition $g\circ f$ is differentiable and derive a formula for it. Indeed:

$\text{ }$

Theorem (Chain Rule): Let $f:U\to\mathbb{R}^m$ and $g:V\to\mathbb{R}^p$ be maps with $U\subseteq \mathbb{R}^n$, $V\subseteq\mathbb{R}^m$ and suppose that $f$ is differentiable at $a\in U$ and $g$ is differentiable at $f(a)\in V$, then $g\circ f:\mathbb{R}^n\to\mathbb{R}^p$ is differentiable at $a$ and

$\text{ }$

$D_{g\circ f}(a)=D_g(f(a))\circ D_f(a)$

Or, in Jacobian notation

$\text{Jac}_{g\circ f}(a)=\text{Jac}_g(f(a))\,\text{Jac}_f(a)$

Proof: Let

$\text{ }$

$F(x)=f(x)-f(a)-D_f(a)(x-a)$

$\text{ }$

$G(y)=g(y)-g(f(a))-D_g(f(a))(y-f(a))$

and

$H(x)=g(f(x))-g(f(a))-D_{g}(f(a))\left(D_f(a)(x-a)\right)$

$\text{ }$

We know then that

$\text{ }$

$\displaystyle \lim_{x\to a}\frac{\|F(x)\|}{\|x-a\|}=0$

and

$\displaystyle \lim_{y\to f(a)}\frac{\|G(x)\|}{\|y-f(a)\|}=0$

$\text{ }$

Moreover, we see that

$\text{ }$

$H(x)=G(f(x))-D_{g}(f(a))(F(x))$

$\text{ }$

But, since $D_{g\circ f}(a)=D_{g}(f(a))\circ D_f(a)$ if and only if

$\text{ }$

$\displaystyle \lim_{x\to a}\frac{\|H(x)\|}{\|x-a\|}=0$

$\text{ }$

and since

$\displaystyle \frac{\|H(x)\|}{\|x-a\|}\leqslant \frac{\|G(f(x))\|}{\|x-a\|}+\frac{\|D_g (f(a))(F(x))\|}{\|x-a\|}$

$\text{ }$

it suffices to show that

$\text{ }$

$\displaystyle \lim_{x\to a}\frac{\|G(F(x))\|}{\|x-a\|}=\lim_{x\to a}\frac{\|D_g (f(a))(F(x))\|}{\|x-a\|}=0$

$\text{ }$

The second of these follows immediately from the fact that

$\text{ }$

$\displaystyle 0\leqslant \frac{\left\|D_g(f(a))(F(x))\right\|}{\|x-a\|}\leqslant \left\|D_g(f(a))\right\|_{\text{op}}\frac{\|F(x)\|}{\|x-a\|}$

$\text{ }$

where $\|\cdot\|_\text{op}$ is the operator norm. To show the other limit is zero we note that for any $\varepsilon>0$ there exists some $\delta>0$ such that $\|G(f(x))\|<\varepsilon\|f(x)-f(a)\|$ if $\|f(x)-f(a)\|<\delta$. But, we may choose $\delta'$ such that $\|x-a\|<\delta'$ implies $\|f(x)-f(a)\|<\varepsilon$ (since $f$ is continuous at $a$). Thus, we get that for $\|x-a\|<\delta'$ that

$\text{ }$

\begin{aligned}\left\|G(f(x))\right\| &<\varepsilon\|f(x)-f(a)\|\\ &=\varepsilon\|F(x)+D_f(a)(x-a)\|\\ & \leqslant \varepsilon\|F(x)\|+\varepsilon\|D_f(a)\|_\text{op}\|x-a\|\end{aligned}

$\text{ }$

But, dividing both sides by $\|x-a\|$ and letting $x\to a$ finishes the argument (really what we have to do formally is divide both sides by $\|x-a\|$ and choose a revised restriction on $x$ such that the right-hand expression is less than $\varepsilon$, but this is standard since $\displaystyle \frac{\|F(x)\|}{\|x-a\|}\to 0$). $\blacksquare$

$\text{ }$

$\text{ }$

References:

1. Spivak, Michael. Calculus on Manifolds; a Modern Approach to Classical Theorems of Advanced Calculus. New York: W.A. Benjamin, 1965. Print.

May 24, 2011 -

1. […] Proof: Suppose first that is differentiable at . Then, each coordinate function  is differentiable since the canonical projection map is differentiable everywhere and from where the differentiability follows from the chain rule. […]

Pingback by Further Properties of the Total Derivative (Pt. I) « Abstract Nonsense | May 25, 2011 | Reply

2. […] is differentiable at and is differentiable at since it’s mutlilinear. Thus, by the chain rule we have that is differentiable at and . But, using the same logic as for the previous theorem we […]

Pingback by Further Properties of the Total Derivative (Pt. II) « Abstract Nonsense | May 26, 2011 | Reply

3. […] . Also, we know that is differentiable since it’s multilinear. Thus, since we know from the chain rule that is differentiable at and . But, as […]

Pingback by Further Properties of the Total Derivative (Pt. II) « Abstract Nonsense | May 26, 2011 | Reply

4. […] and multillinear maps are differentiable) each of these functions are differential be know from the chain rule that is differentiable on . Thus, by the one-dimenionsal mean value theorem there exists such […]

Pingback by The Mean Value Theorem for Multivariable Maps « Abstract Nonsense | June 11, 2011 | Reply

5. […] both hold. Now, for each we define . Note then that for each  one has that could be equally well described as (where the constant in and is the identity function) and so, using the chain rule, […]

Pingback by The Inverse Function Theorem (Proof) « Abstract Nonsense | September 8, 2011 | Reply

6. […] and . Thus, if we can verify that is injective for all we’ll be done. That said, from the chain rule we know that and since and are both injective ( by assumption, and (actually is invertible) by […]

Pingback by Surfaces (Pt. II) « Abstract Nonsense | October 9, 2011 | Reply

7. […] Now, by assumption that is injective we know that it has a minor which is non-zero, we may assume that the minor is created by the first two columns. Obviously, this assumption can be made since otherwise we may just apply a linear isomorphism of to our map making this conclusion so, but which will not change the conclusion.  Consider then the orthogonal projection where we’ve identified with the plane. We see then that is smooth, and by the chain rule […]

Pingback by Surfaces (Pt. III) « Abstract Nonsense | October 9, 2011 | Reply

8. […] the chain rule for total derivatives and the fact that (complex total derivative equals total derivative) we know that , but this is […]

Pingback by Complex Differentiable and Holmorphic Functions (Pt. II) « Abstract Nonsense | May 1, 2012 | Reply