� � � � 12. Chain rule Theorem 12.1 (Chain Rule). Let U ⊂ Rn and let V ⊂ Rm be two open subsets. Let f : U −→ V and g : V −→ Rp be two functions. If f is differentiable at P and g is differentiable at Q = f(P ), then g ◦ f : U −→ Rp is differentiable at P , with derivative: D(gf)(P )=(D(g)(Q))(D(f)(P )).◦ It is interesting to untwist this result in specific cases. Suppose we are given f : R −→ R2 and g : R2 −→ R. So f(x)=(f1(x),f2(x)) and w = g(y,z). Then df1 (P ) ∂g ∂g dxDf(P )= df2 and Dg(Q)=((Q), (Q)). (P ) ∂y ∂z dx So d(gf) ∂g df1 ∂g df2◦ = D(gf)(P )= Dg(Q)Df(P )= (Q)(P )+(Q)(P ). dx ◦ ∂y dx ∂z dx Example 12.2. Suppose that f(x)=(x2,x3) and g(y,z)= yz. If we apply the chain rule, we get D(gf)(x)= z(2x)+ y(3x 2)=5x 4 .◦ On the other hand (gf)(x)= x5, and of course ◦ dx54=5x. dx Now suppose that f : R2 −→ R2 and g : R2 −→ R So f(x,y)=(f1(x,y),f2(x,y)) and w = g(u,v). Then ∂f1 (P ) ∂f2 (P ) ∂g ∂g ∂x ∂x Df(P )= ∂f2 ∂f2 and Dg(Q)=( (Q), (Q)). (P )(P ) ∂u∂v ∂x ∂x In this case D(gf)=( ∂(g ◦ f) ,∂(g ◦ f))◦ ∂x ∂y ∂g ∂f1 ∂g ∂f2 ∂g ∂f1 ∂g ∂f2=( (Q)(P )+ (Q)(P ), (Q)(P )+ (Q)(P )). ∂u∂x ∂v ∂x ∂u∂y ∂v ∂y ∂g ∂u ∂g ∂v ∂g ∂u ∂g ∂v =( (Q)(P )+ (Q)(P ), (Q)(P )+ (Q)(P ))∂u∂x∂v ∂x∂u∂y ∂v ∂y ∂g ∂u ∂g ∂v ∂g ∂u ∂g ∂v =( + , +),∂u ∂x ∂v ∂x∂u ∂y ∂v ∂y 1 since u = f1(x,y) and v = f2(x,y). Notice that in the last line we were a bit sloppy and dropped P and Q. If we split this vector equation into its components we get ∂(gf) ∂g ∂f1 ∂g ∂f2◦ =(Q)(P )+ (Q)(P )∂x ∂u∂x ∂v ∂x ∂(gf) ∂g ∂f1 ∂g ∂f2◦ =(Q)(P )+ (Q)(P ). ∂y ∂u∂y ∂v ∂y Again, we could replace f1 by u and f2 by v in these equations, and maybe even drop P and Q. Example 12.3. Suppose that f(x,y) = (cos(xy),ex−y) and g(u,v)= u2 sin v. If we apply the chain rule, we get D(g ◦ f)(x) = (2u sin v(−y sin xy)+ u 2 cos v(e x−y), −2u sin vx sin xy − u 2 cos ve x−y = (2 cos(xy)sin(e x−y)(−y sin xy) + cos2(xy)cos(e x−y)e x−y,... ). In general, the (i,k) entry of D(gf)(P ), that is ◦ ∂(gf)i◦ ∂xk is given by the dot product of the ith row of Dg(Q) and the kth column of Df(P ), m∂(g ◦ f)i = � ∂gi (Q) ∂fj (P ). ∂xkj=1 ∂yj ∂xi If z =(gf)(P ), then we get ◦ m∂zi � ∂zi ∂yj=(Q)(P ). ∂xkj=1 ∂yj ∂xi We can use the chain rule to prove some of the simple rules for derivatives. Suppose that we have f : Rn −→ Rm and g : Rn −→ Rm . Suppose that f and g are differentiable at P . What about f + g? Well there is a function a: R2m ,−→ Rm which sends (�u,�v) ∈ Rm × Rm to the sum �u + �v. In coordinates (u1,u2,...,um,v1,v2,...,vm), a(u1,u2,...,um,v1,v2,...,vm)=(u1 + v1,u2 + v2,...,um + vm). 2 � � Now a is differentiable (it is a polynomial, linear even). There is functiio h: Rn −→ R2m , which sends Q to (f(Q),g(Q)). The composition a ◦ h: Rn −→ Rm is the function we want to differentiate, it sends P to f(P )+ g(P ). The chain rule says that that the function is differentiable at P and D(f + g)(P )= Df(P )+ Dg(P ).Now suppose that m = 1. Instead of a, consider the functionm: R2 −→ R, given by m(x,y)= xy. Then m is differentiable, with derivative Dm(x,y)=(y,x). So the chain rule says the composition of h and m, namely the functiio which sends P to the product f(P )g(P ) is differentiable and the derivative satisfies the usual rule D(fg)(P )= g(P )D(f)(P )+ f(P )D(g)(P ). Here is another example of the chain rule, suppose x = r cos θ y = r sin θ. Then ∂f ∂f ∂x ∂f ∂y =+ ∂r ∂x ∂r ∂y ∂r ∂f ∂f = cos θ + sin θ. ∂x ∂y Similarly, ∂f ∂f ∂x ∂f ∂y =+ ∂θ ∂x ∂θ ∂y ∂θ ∂f ∂f = −∂x r sin θ + ∂y r cos θ. We can rewrite this as ��� �� � ∂ ∂ cos θ sin θ∂r ∂x ∂ = ∂−r sin θr cos θ∂θ ∂y Now the determinant of cos θ sin θ −r sin θr cos θ 3 is r(cos2 θ + sin2 θ)= r. So if r = 0, then we can invert the matrix above and we get ���� �� �∂∂1 r cos θ − sin θ∂x ∂r ∂ = rr sin θ cos θ ∂ ∂y ∂θ We now turn to a proof of the chain rule. We will need: Lemma 12.4. Let A ⊂ Rn be an open subset and let f : A −→ Rm be a function. If f is differentiable at P , then there is a constant M ≥ 0 and δ> 0 such that if �−→PQ� <δ, then �f(Q) − f(P )� 0 such that if �−→PQ� <δ, then �f(Q) − f(P ) − Df(P )−→PQ� �−→< 1. PQ� Hence �f(Q) − f(P ) − Df(P )−→PQ�.PQ� < �−→But then �f(Q) − f(P )� = �f(Q) − f(P ) − Df(P )−→PQ�PQ + Df(P )−→PQ� + �Df(P )−→PQ�≤�f(Q) − f(P ) − Df(P )−→PQ� + K�−→PQ�≤ �−→= M�−→PQ�, where M =1+ K. � Proof of (12.1). Let’s fix some notation. We want the derivative at P . Let Q = f(P ). Let P � be a point in U (which we imagine is close to P ). Finally, let Q� = f(P �) (so if P � is close to P , then we expect Q� to be close to Q). The trick is to carefully define an auxiliary function G: V −→ Rp, ⎧ ⎨g(Q�)−g(Q)−Dg(Q)(−−→�−−→QQ�) if Q� = QG(Q�)= ⎩ QQ�� ��0 if Q� = Q. 4 Then G is continuous at Q = f(P ), as g is differentiable at Q. Now, (gf)(P �) − (g PP �)f)(P ) − Dg(Q)Df(P )(−◦ ◦ �−P P �� f(P �) − f(P ) − Df(P P �)P )(−→= Dg(Q)+ G(f(P �))�f(P �) − f(P )� �PP �� �−−→. PP �� As P � approaches P , note that PP �)f(P �) − f(P ) − Df(P )(−−→, �PP �� and G(P �) both approach zero and �f(P �) − f(P )�≤ M. �PP �� So then (g ◦ f)(P �) − (g ◦ PP �) , f)(P ) − Dg(Q)Df(P )(−→�PP �� approaches zero as well, which is what we want. � 5 MIT OpenCourseWarehttp://ocw.mit.edu 18.022 Calculus of Several Variables Fall 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.