Abstract Nonsense

Crushing one theorem at a time

Directional Derivatives and Partial Derivatives


Point of Post: In this post we discuss the notions of directional derivatives and partial derivatives

\text{ }

Motivation

\text{ }

Roughly what the total derivative does is describe conditions when a function can be locally approximated very well (sublinearly) well by an affine transformation. Indeed, suppose that f:\mathbb{R}^n\to\mathbb{R}^m is differentiable at a\in\mathbb{R}^n. By definition the limit for any \varepsilon>0 we may choose \delta>0 such that \|h\|<\delta implies \left\|f(a+h)-\left(T(h)+f(a)\right)\right\|<\varepsilon\|h\|. Note that that we see from this that ‘locally’ here means in all possible directions (as soon as h is within the open ball B_{\delta}(a) the above inequality applies). Sometimes though we are only interested in the approximation, or notion of change in a particular direction. Thus arises the directional derivative which, roughly put, measures the rate of change of a function f at a point a towards a vector u. The idea is simple, namely one takes, along the line from a to u, successive differences of the value f(a) and f(a+tu) and let t tend to zero. When the vector is one of the elements of the canonical basis we get the partial derivatives which, as we shall see, are of huge importance in multivariable differential analysis.

\text{ }

Directional Derivatives

\text{ }

Let U\subseteq\mathbb{R}^n be open and suppose that f:U\to\mathbb{R}^m. Suppose that a\in U and u\in\mathbb{R}^n, if the limit

\text{ }

\displaystyle \lim_{t\to0}\frac{f\left(a+tu\right)-f(a)}{t}

\text{ }

(where the limit is taken over t\in\mathbb{R}) exists we say that f has a directional derivative at a in the direction of u and denote this limit as D_f(a;u) (remembering that it’s the directional derivative of f at a in the direction of u–syntactically the order of the symbols makes sense). It is important to note that this limit lives inside \mathbb{R}^m and that U is open for similar convexity reasons as discussed for the total derivative. If  u is a canonical basis vector e_j then we call D_f(a;e_j) the partial derivative with respect to e_j or the partial derivative with respect to the j^{\text{th}} variable and, when desirable, can alternatively denote it as D_jf(a) (omitting parentheses around the f since, thinking of D_j as an operator, one usually writes Tv and not T(v))  or  \displaystyle \frac{\partial f}{\partial x_j}(a) if we are thinking of our function f as f(x_1,\cdots,x_j,\cdots,x_n) (i.e. the variables are denoted x_1,\cdots,x_n). If instead we had a function f:\mathbb{R}^3\to\mathbb{R}^2 and our variables are of the form f(x,y,z) we would denote D_2f(a) as \displaystyle \frac{\partial f}{\partial y}(a). If our function f has partial derivatives in the direction of u for every element of some region R\subseteq U then we have a function D_f(-;u):R\to\mathbb{R}^m:a\mapsto D_f(a;u).

\text{ }

If m=1 in the above definition (i.e. if f is real-valued) then directional derivatives have a very nice interpretation. To get the idea we first inspect partial derivatives. Namely, since U is open we can find some open ball B_\delta(a)\subseteq U. Then, we know that \{a_1\}\times\times\cdots(a_j-\delta,a_j+\delta)\times\cdots\times\{a_n\} is contained in U. We can then define a function g:(-\delta,\delta)\to\mathbb{R} given by g(t)=f(a_1,\cdots,a_{j-1},a_j+t,a_{j+1},\cdots,a_n). It’s not hard to see then that D_jf(a) exists if and only if g is differentiable (in the usual one-dimensional sense) at 0 and moreover D_jf(a)=g'(0). This is really nice because it allows us to evaluate derivatives in the ‘everything else is constant’ way we learned in calculus. Namely, to find D_jf(a) one merely ‘pretends’ the variables x_i,\; i\ne j are constant in f(x_1,\cdots,x_j,\cdots,x_n) and differentiate with respect to x_j normally. So that for example if f(x,y)=x^2y^2+\sin(x) then D_yf(w,z)=2w^2z. Now, we could have phrased this a little less explicitly by noting that really g(t)=f(a+te_j). We can then generalize this to note that for any direction u we could have considered g(t)=f(a+tu) (where t lives in an interval defined analogously to above) and then realized that having a directional derivative in the direction of u is equivalent to having g be differentiable at 0 and moreover that g'(0)=D_f(a;u).

\text{ }

The first thing we’d like to show is that, just as for the total derivative, having a directional derivative in some direction is equivalent to having a directional derivative for each of the coordinate functions. More explicitly:

\text{ }

Theorem: Let f:U\to\mathbb{R}^m, where U\subseteq\mathbb{R}^n is open, and let u\in\mathbb{R}^n. Then, f has a directional derivative at a\in U  in the direction of u  if and only if f_1,\cdots,f_m have directional derivatives at a in the direction of u and in which case D_f(a;u)=\left(D_{f_1}(a;u),\cdots,D_{f_m}(a;u)\right).

Proof: This follows immediately from the common theorem that a limit of a vector-valued function exists if and only if the limit of each coordinate function exists, and in which case the limit of the vector valued function is the tuple where each coordinate is the limit of the corresponding coordinate function. \blacksquare

\text{ }

We next prove that directional derivatives (and in particular partial derivatives) are’ homogeneous in the direction’ in the following sense:

\text{ }

Theorem: Let f:U\to\mathbb{R}^m, U\subseteq\mathbb{R}^n open, have a directional derivative at a\in U in the direction u. Then, f has a directional derivative at a in the direction of cu for any c\in\mathbb{R}-\{0\} and D_f(a;cu)=cD_f(a;u)

Proof: To prove the first assertion we merely note from basic analysis that since we assumed D_f(a;u) exists that

\text{ }

\displaystyle D_f(a;cu)=\lim_{t\to 0}\frac{f(a+t(cu))-f(a)}{t}=c\lim_{t\to0}\frac{f(a+(ct)u))-f(a)}{ct}

\text{ }

exists and is equal to

\displaystyle c\lim_{s\to 0}\frac{f(a+su)-f(a)}{s}=c D_f(a;u)

\blacksquare

\text{ }

This proves the intuitive idea that the directional derivative in the opposite direction should be the negative of the directional derivative in the positive direction.

\text{ }

We lastly note the following theorem whose proof is so obvious we omit it:

\text{ }

Theorem: Let f_1,\cdots,f_k:U\to\mathbb{R}^m, latex U\subseteq\mathbb{R}^n$ open, have directional derivatives at a\in U in the direction of u\in\mathbb{R}^n. Then, for any constants c_1,\cdots,c_k\in\mathbb{R} one has that c_1f_1+\cdots+c_k f_k has a directional derivative at a in the direction of u and D_{c_1f_1+\cdots+c_kf_k}(a;u)=c_1D_{f_1}(a;u)+\cdots+c_k d_{f_k}(a;u).

\text{ }

\text{ }

References:

1. Spivak, Michael. Calculus on Manifolds; a Modern Approach to Classical Theorems of Advanced Calculus. New York: W.A. Benjamin, 1965. Print.

2. Apostol, Tom M. Mathematical Analysis. Reading, MA: Addison-Wesley Pub., 1974. Print.

Advertisements

May 29, 2011 - Posted by | Analysis | , , , ,

12 Comments »

  1. […] was mentioned in our last post if a function possesses a partial derivative for every for some open region we then get a […]

    Pingback by Higher Order Partial Derivatives and the Equality of Mixed Partials (Pt. I) « Abstract Nonsense | May 31, 2011 | Reply

  2. […] is true for all mappings with and let satisfy the hypotheses. We then note if then from a previous result we have that   and we have by assumption that is continuous on that are continuous on for […]

    Pingback by Higher Order Partial Derivatives and the Equality of Mixed Partials (Pt. II) « Abstract Nonsense | June 1, 2011 | Reply

  3. […] The form of the Jacobian follows from the fact that by the previous theorem we know that and from previous theorem this is equal to from where the conclusion […]

    Pingback by Relationship Between the Notions of Directional and Total Derivatives (Pt.I) « Abstract Nonsense | June 2, 2011 | Reply

  4. In the definition of directional derivative, the Wolfram site says u must be a unit vector. I’ve seen posts in math forums that complain about Apostol’s definition omitting that requirement.

    Comment by Stephen Tashiro | September 2, 2011 | Reply

    • Hello Stephen! I would be tempted to disagree with people’s objections. I mean really, it seems like a fairly artificial decision to restrict to unit vectors except that they form, in a sense, a representative class of vectors. Is there a reason you disagree with it?

      Thanks for reading my blog!

      Best,
      Alex

      Comment by Alex Youcis | September 3, 2011 | Reply

  5. I agree that Apostol’s definition is useful and more general that requiring that v be a unit vector. It bothers me a little that we only use the word “directional” for it. That doesn’t convey the “magnitude” part of the “magnitude and direction” of a vector.

    Comment by Stephen Tashiro | September 3, 2011 | Reply

  6. I can see what you’re saying. Perhaps the best way to do it would be to define the directional derivative for arbitrary vectors and then discuss why it’s named what it is by considering the case when the vector is a unit.

    Comment by Alex Youcis | September 3, 2011 | Reply

  7. In a thread on physicsforums.com, the poster Hootenanny says:
    “Actually, upon re-reading Apostol’s definition as posted above, he does not mention the term “directional derivative”. It is Tomer that asserts that this is the directional derivative. Apostol only refers to the derivative with respect to a vector”

    I don’t have a copy of Apostol’s book to check this.

    Comment by Stephen Tashiro | September 6, 2011 | Reply

    • Stephen,

      Is there really a difference?

      Comment by Alex Youcis | September 7, 2011 | Reply

  8. It’s just a technicality about terminology, not “real math”. However, if Apostol is the reference and he doesn’t refer to his definition as the definition of “directional derivative”, then I wouldn’t put those words in his mouth. Perhaps you have another reference that does call it a “directional derivative”.

    Comment by Stephen Tashiro | September 7, 2011 | Reply

    • Stephen,

      Ah, I see what you mean now. I think the problem is what I mean by ‘reference’. There is a page on my blog that explains that by ‘references’ I really mean ‘place where you can find material similar to what I am doing’, so perhaps further reading. That said, I do on occasion write with a particular author in mind, and if so I try to make specific mention of this. Does that help clear things up?

      Best,
      Alex

      Comment by Alex Youcis | September 7, 2011 | Reply

  9. […] note that since the Jacobian matrix  (where, as usual if then where denotes the partial derivative) that being injective is equivalent to and being linearly independent for each . To beat a dead […]

    Pingback by Surfaces « Abstract Nonsense | October 7, 2011 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: