# Abstract Nonsense

## Directional Derivatives and Partial Derivatives

Point of Post: In this post we discuss the notions of directional derivatives and partial derivatives

$\text{ }$

Motivation

$\text{ }$

Roughly what the total derivative does is describe conditions when a function can be locally approximated very well (sublinearly) well by an affine transformation. Indeed, suppose that $f:\mathbb{R}^n\to\mathbb{R}^m$ is differentiable at $a\in\mathbb{R}^n$. By definition the limit for any $\varepsilon>0$ we may choose $\delta>0$ such that $\|h\|<\delta$ implies $\left\|f(a+h)-\left(T(h)+f(a)\right)\right\|<\varepsilon\|h\|$. Note that that we see from this that ‘locally’ here means in all possible directions (as soon as $h$ is within the open ball $B_{\delta}(a)$ the above inequality applies). Sometimes though we are only interested in the approximation, or notion of change in a particular direction. Thus arises the directional derivative which, roughly put, measures the rate of change of a function $f$ at a point $a$ towards a vector $u$. The idea is simple, namely one takes, along the line from $a$ to $u$, successive differences of the value $f(a)$ and $f(a+tu)$ and let $t$ tend to zero. When the vector is one of the elements of the canonical basis we get the partial derivatives which, as we shall see, are of huge importance in multivariable differential analysis.

$\text{ }$

Directional Derivatives

$\text{ }$

Let $U\subseteq\mathbb{R}^n$ be open and suppose that $f:U\to\mathbb{R}^m$. Suppose that $a\in U$ and $u\in\mathbb{R}^n$, if the limit

$\text{ }$

$\displaystyle \lim_{t\to0}\frac{f\left(a+tu\right)-f(a)}{t}$

$\text{ }$

(where the limit is taken over $t\in\mathbb{R}$) exists we say that $f$ has a directional derivative at $a$ in the direction of $u$ and denote this limit as $D_f(a;u)$ (remembering that it’s the directional derivative of $f$ at $a$ in the direction of $u$–syntactically the order of the symbols makes sense). It is important to note that this limit lives inside $\mathbb{R}^m$ and that $U$ is open for similar convexity reasons as discussed for the total derivative. If  $u$ is a canonical basis vector $e_j$ then we call $D_f(a;e_j)$ the partial derivative with respect to $e_j$ or the partial derivative with respect to the $j^{\text{th}}$ variable and, when desirable, can alternatively denote it as $D_jf(a)$ (omitting parentheses around the $f$ since, thinking of $D_j$ as an operator, one usually writes $Tv$ and not $T(v)$)  or  $\displaystyle \frac{\partial f}{\partial x_j}(a)$ if we are thinking of our function $f$ as $f(x_1,\cdots,x_j,\cdots,x_n)$ (i.e. the variables are denoted $x_1,\cdots,x_n$). If instead we had a function $f:\mathbb{R}^3\to\mathbb{R}^2$ and our variables are of the form $f(x,y,z)$ we would denote $D_2f(a)$ as $\displaystyle \frac{\partial f}{\partial y}(a)$. If our function $f$ has partial derivatives in the direction of $u$ for every element of some region $R\subseteq U$ then we have a function $D_f(-;u):R\to\mathbb{R}^m:a\mapsto D_f(a;u)$.

$\text{ }$

If $m=1$ in the above definition (i.e. if $f$ is real-valued) then directional derivatives have a very nice interpretation. To get the idea we first inspect partial derivatives. Namely, since $U$ is open we can find some open ball $B_\delta(a)\subseteq U$. Then, we know that $\{a_1\}\times\times\cdots(a_j-\delta,a_j+\delta)\times\cdots\times\{a_n\}$ is contained in $U$. We can then define a function $g:(-\delta,\delta)\to\mathbb{R}$ given by $g(t)=f(a_1,\cdots,a_{j-1},a_j+t,a_{j+1},\cdots,a_n)$. It’s not hard to see then that $D_jf(a)$ exists if and only if $g$ is differentiable (in the usual one-dimensional sense) at $0$ and moreover $D_jf(a)=g'(0)$. This is really nice because it allows us to evaluate derivatives in the ‘everything else is constant’ way we learned in calculus. Namely, to find $D_jf(a)$ one merely ‘pretends’ the variables $x_i,\; i\ne j$ are constant in $f(x_1,\cdots,x_j,\cdots,x_n)$ and differentiate with respect to $x_j$ normally. So that for example if $f(x,y)=x^2y^2+\sin(x)$ then $D_yf(w,z)=2w^2z$. Now, we could have phrased this a little less explicitly by noting that really $g(t)=f(a+te_j)$. We can then generalize this to note that for any direction $u$ we could have considered $g(t)=f(a+tu)$ (where $t$ lives in an interval defined analogously to above) and then realized that having a directional derivative in the direction of $u$ is equivalent to having $g$ be differentiable at $0$ and moreover that $g'(0)=D_f(a;u)$.

$\text{ }$

The first thing we’d like to show is that, just as for the total derivative, having a directional derivative in some direction is equivalent to having a directional derivative for each of the coordinate functions. More explicitly:

$\text{ }$

Theorem: Let $f:U\to\mathbb{R}^m$, where $U\subseteq\mathbb{R}^n$ is open, and let $u\in\mathbb{R}^n$. Then, $f$ has a directional derivative at $a\in U$  in the direction of $u$  if and only if $f_1,\cdots,f_m$ have directional derivatives at $a$ in the direction of $u$ and in which case $D_f(a;u)=\left(D_{f_1}(a;u),\cdots,D_{f_m}(a;u)\right)$.

Proof: This follows immediately from the common theorem that a limit of a vector-valued function exists if and only if the limit of each coordinate function exists, and in which case the limit of the vector valued function is the tuple where each coordinate is the limit of the corresponding coordinate function. $\blacksquare$

$\text{ }$

We next prove that directional derivatives (and in particular partial derivatives) are’ homogeneous in the direction’ in the following sense:

$\text{ }$

Theorem: Let $f:U\to\mathbb{R}^m$, $U\subseteq\mathbb{R}^n$ open, have a directional derivative at $a\in U$ in the direction $u$. Then, $f$ has a directional derivative at $a$ in the direction of $cu$ for any $c\in\mathbb{R}-\{0\}$ and $D_f(a;cu)=cD_f(a;u)$

Proof: To prove the first assertion we merely note from basic analysis that since we assumed $D_f(a;u)$ exists that

$\text{ }$

$\displaystyle D_f(a;cu)=\lim_{t\to 0}\frac{f(a+t(cu))-f(a)}{t}=c\lim_{t\to0}\frac{f(a+(ct)u))-f(a)}{ct}$

$\text{ }$

exists and is equal to

$\displaystyle c\lim_{s\to 0}\frac{f(a+su)-f(a)}{s}=c D_f(a;u)$

$\blacksquare$

$\text{ }$

This proves the intuitive idea that the directional derivative in the opposite direction should be the negative of the directional derivative in the positive direction.

$\text{ }$

We lastly note the following theorem whose proof is so obvious we omit it:

$\text{ }$

Theorem: Let $f_1,\cdots,f_k:U\to\mathbb{R}^m,$latex U\subseteq\mathbb{R}^n\$ open, have directional derivatives at $a\in U$ in the direction of $u\in\mathbb{R}^n$. Then, for any constants $c_1,\cdots,c_k\in\mathbb{R}$ one has that $c_1f_1+\cdots+c_k f_k$ has a directional derivative at $a$ in the direction of $u$ and $D_{c_1f_1+\cdots+c_kf_k}(a;u)=c_1D_{f_1}(a;u)+\cdots+c_k d_{f_k}(a;u)$.

$\text{ }$

$\text{ }$

References:

1. Spivak, Michael. Calculus on Manifolds; a Modern Approach to Classical Theorems of Advanced Calculus. New York: W.A. Benjamin, 1965. Print.

2. Apostol, Tom M. Mathematical Analysis. Reading, MA: Addison-Wesley Pub., 1974. Print.

May 29, 2011 -

1. […] was mentioned in our last post if a function possesses a partial derivative for every for some open region we then get a […]

Pingback by Higher Order Partial Derivatives and the Equality of Mixed Partials (Pt. I) « Abstract Nonsense | May 31, 2011 | Reply

2. […] is true for all mappings with and let satisfy the hypotheses. We then note if then from a previous result we have that   and we have by assumption that is continuous on that are continuous on for […]

Pingback by Higher Order Partial Derivatives and the Equality of Mixed Partials (Pt. II) « Abstract Nonsense | June 1, 2011 | Reply

3. […] The form of the Jacobian follows from the fact that by the previous theorem we know that and from previous theorem this is equal to from where the conclusion […]

Pingback by Relationship Between the Notions of Directional and Total Derivatives (Pt.I) « Abstract Nonsense | June 2, 2011 | Reply

4. In the definition of directional derivative, the Wolfram site says u must be a unit vector. I’ve seen posts in math forums that complain about Apostol’s definition omitting that requirement.

Comment by Stephen Tashiro | September 2, 2011 | Reply

• Hello Stephen! I would be tempted to disagree with people’s objections. I mean really, it seems like a fairly artificial decision to restrict to unit vectors except that they form, in a sense, a representative class of vectors. Is there a reason you disagree with it?

Best,
Alex

Comment by Alex Youcis | September 3, 2011 | Reply

5. I agree that Apostol’s definition is useful and more general that requiring that v be a unit vector. It bothers me a little that we only use the word “directional” for it. That doesn’t convey the “magnitude” part of the “magnitude and direction” of a vector.

Comment by Stephen Tashiro | September 3, 2011 | Reply

6. I can see what you’re saying. Perhaps the best way to do it would be to define the directional derivative for arbitrary vectors and then discuss why it’s named what it is by considering the case when the vector is a unit.

Comment by Alex Youcis | September 3, 2011 | Reply

7. In a thread on physicsforums.com, the poster Hootenanny says:
“Actually, upon re-reading Apostol’s definition as posted above, he does not mention the term “directional derivative”. It is Tomer that asserts that this is the directional derivative. Apostol only refers to the derivative with respect to a vector”

I don’t have a copy of Apostol’s book to check this.

Comment by Stephen Tashiro | September 6, 2011 | Reply

• Stephen,

Is there really a difference?

Comment by Alex Youcis | September 7, 2011 | Reply

8. It’s just a technicality about terminology, not “real math”. However, if Apostol is the reference and he doesn’t refer to his definition as the definition of “directional derivative”, then I wouldn’t put those words in his mouth. Perhaps you have another reference that does call it a “directional derivative”.

Comment by Stephen Tashiro | September 7, 2011 | Reply

• Stephen,

Ah, I see what you mean now. I think the problem is what I mean by ‘reference’. There is a page on my blog that explains that by ‘references’ I really mean ‘place where you can find material similar to what I am doing’, so perhaps further reading. That said, I do on occasion write with a particular author in mind, and if so I try to make specific mention of this. Does that help clear things up?

Best,
Alex

Comment by Alex Youcis | September 7, 2011 | Reply

9. […] note that since the Jacobian matrix  (where, as usual if then where denotes the partial derivative) that being injective is equivalent to and being linearly independent for each . To beat a dead […]

Pingback by Surfaces « Abstract Nonsense | October 7, 2011 | Reply