A Primer in Dynamic Programming

Description

A Primer in Dynamic Programming Goals Practical guide to dynamic programming Intuition, not formal mathematical proofs Restrictions: (1) use available mathematical skills only, (2) easy examples Prepare for Walsh’s chapters 2 and 3!

Comments
Would you like to comment?

Sign In if already a member, or Join Now for a free account.

Presentation Transcript Presentation Transcript

A Primer in Dynamic Programming : A Primer in Dynamic Programming

Goals : Goals Practical guide to dynamic programming Intuition, not formal mathematical proofs Restrictions: (1) use available mathematical skills only, (2) easy examples Prepare for Walsh’s chapters 2 and 3!

Contents : Contents Intuition in a discrete network Example: Cake Eating problem Formal approach Example: Cass-Koopmans growth model Envelope theorem Cass-Koopmans continued Sidrauski: Money in the Utility Function

Intuition of Dynamic Programming : Intuition of Dynamic Programming Bellman’s principle: an optimal trajectory has the property that, whatever the initial conditions are, the remaining path must constitute an optimum We will illustrate this principle using the so-called Traveling Salesman problem: find the shortest route between cities

A graph of the Traveling Salesman problem: shortest route 1-7 : A graph of the Traveling Salesman problem: shortest route 1-7 1 1 2 3 4 5 6 7 78 89 102 38 54 45 37 15 12 22 80

Optimality : Optimality Suppose 1-3-6-7 is the optimal route, then 3-6-7 needs to be the optimal route from 3 to 7: 3-6-7 needs to be shorter than 3-5-7 So we can use backward induction Let Mn(i) be the minimal distance to the final point given that the salesman is in town i and has to make n more decisions Dij is the distance from town i to j J(i) is the set of towns the salesman can choose from when in town i (so J(1) = 2,3,4) Mn(i) = min jєJ(i) {Dij + Mn-1(j)}

Backward induction : Backward induction Step 1: M1(5) = 45, M1(6) = 37 Step 2: M2(2) = min j = 5,6 (D2j + M1(j)) = min(38 + 45, 54 + 37) = 83 and M2(3) = min(80 + 45, 22 + 37) = 59 and M2(4) = min(12 + 45, 15 + 37) = 52 Step 3: M3(1) = min j = 2,3,4 (D1j + M2(j)) = min{78 + 83, 89 + 59, 102 + 52} = 148 So the path is 1-3-6-7!

A Cake-Eating example : A Cake-Eating example Suppose we have a cake of size W1, which does not depreciate At each moment of time t = 1,2,3,…,T you can eat some cake but must save the rest Utility remains stationairy u(ct) Maximize t=1T t u(ct) with a discount factor 0    1 We have non-negativity constraints ct  0 and Wt  0 Law of motion for the cake holds at all periods: Wt+1 = Wt - ct

Cake-Eating: Direct Attack : Cake-Eating: Direct Attack This is called the sequence problem Rewrite all constraints into: t=1T ct + WT+1 = W1. This reduces the number of non-negativity constraints: only ct  0 for all t = 1,2,3,…,T, and WT+1  0. Let  be the multiplier on the resource constraint and  on the non-negativity condition on WT+1. We get the first-order conditions: t-1 u’(ct) = , for t = 1,2,…, T  =  [Note that we ignore the nonnegativity conditions on ct, because we assume that lim u’(c )   if c  0] Combining gives the necessary condition: u’(ct) =  u’(ct+1). This is the Euler condition that holds at all t

Cake-Eating (3) : Cake-Eating (3) The Euler condition is necessary, but not sufficient for an optimal trajectory Sufficient is that the terminal condition WT+1 = 0 holds (and so binds):  =  > 0 has to hold The agent wants WT+1 = - and we need to avoid that Define a value function VT (W1): the maximum utility flow from a T-period problem given a size W1 cake A slight increase in the size of the cake leads to an equal increase in marginal utility in each period: V’T (W1) =  = t-1 u’(ct) So it does not matter when the extra cake is eaten given that the consumer is acting optimally

Dynamic programming of the Cake Eating: a finite horizon problem : Dynamic programming of the Cake Eating: a finite horizon problem We add a period 0 and an initial cake size W0. We consider the problem: Max u(c0) + VT (W1) Where W1 = W0 - c0 and W0 given Principle of optimality: VT (W1) fully summarizes optimal behavior (and so maximum utility VT (W1) from t = 1) First-order condition: u’(c0) =  V’T (W1) We noted before that: V’T (W1) = u’(c1) = t u’(ct+1) Which gives finally again: u’(ct) =  u’(ct+1).

Cake Eating: Infinite Horizon : Cake Eating: Infinite Horizon Max t=1 t u(ct) subject to Wt+1 = Wt - ct V(W) = Max u(c) + V (W - c) for all W The size of the cake W is the state variable, consumption is the control variable, W ’ = W - c the transition equation (law of motion) So the functional equation is: V(W) = Max u(W - W’) + V (W’) We specify the problem so that instead of choosing today’s consumption we choose tomorrow’s state!

Cake Eating : Cake Eating First-order condition u’(c) =  V ’(W ’) [simply take the derivative of the goal function] We have shown above that V ’(W) = u’(c) (this holds for all W, so it will hold in the next period as well), so V ’(W ’) = u’(c’) in the following period, which gives again the Euler condition: u’(c) =  u’(c’) It is useful then to specify a policy function c = k(W) that prescribes how much to consume

Cake Eating: example (1) : Cake Eating: example (1) u(c) = log(c), so guess that the solution is V(W) = A + B log(W) for all W Are there parameters A and B that satisfy the functional equation? A + B log(W) = max log(W – W ’) + (A + B log(W’)) u’(c) = 1/c = B/(W - c), so c = W/(1+ B) and W ’ = W – c = BW/(1+ B)

Cake Eating: example (2) : Cake Eating: example (2) Using this in the functional form: A + B log (W) = max log (W / (1+ B)) + (A + B log (BW / (1 + B))) Collecting terms for the log(W)’s we get B = 1/(1- ) and we can do the same for A (which is less interesting!) So, our guess works in the model Policy: save a constant fraction of the cake and eat the rest: c = W (1 - ) And W ’ =  W

Dynamic Programming : Dynamic Programming Easy in discrete time-problems max  tf(x,u) subject to a difference equation dx = g(x,u) and x(0) = x0, x(T) = some terminal condition xT* Use the so-called Value Function: V(xt) = max {f(x,u) + V(xt+1)} with  as a discount factor (widely applicable in economics) Subject to: xt+1 = g(x,u) with slack parameter 

Solve the Bellman Equation : Solve the Bellman Equation Optimality: fu + guV’(xt+1) = 0 fu = : via the so-called envelope theorem: in order to get an interior solution The appropriate terminal conditions: e.g. for nonnegative state variables: T [xT* – x(T)] = 0

Example: the Cass-Koopmans optimal growth model : Example: the Cass-Koopmans optimal growth model Max U =  t u(ct) Subject to ct + kt+1= f(kt) and a given k0. Depreciation is complete kt+1 = (1 - )kt + it and  = 1 So we have to find the optimal time series of kt, kt+1,… to maximize U Rewrite U =  t u(f(kt) – kt+1)

Cass-Koopmans model (2) : Cass-Koopmans model (2) For one t (e.g. t+1) we get: t u(f(kt) – kt+1) + t+1 u(f(kt+1) – kt+2) Optimality leads to: u’(ct) = f’(kt+1)u’(ct+1) A typical Euler equation if we know that: R = f’(kt+1) Next an example and after that we do it in the dynamic programming format using the Bellman principle

Cass-Koopmans: Example : Cass-Koopmans: Example u(ct) = log(ct) and f(kt) = kta FOC: u’(ct) = f ’(kt+1) u’(ct+1). So: 1/(kta - kt+1) = [1/(kt+1a - kt+2)] a kt+1a-1 Define the savings rate zt = kt+1/kta and the FOC reads: zt /(1 - zt) = a (1 / (1 - zt+1)). This first-order difference equation is a bit easier to solve One can show that if the horizon gets to infinity zt goes to a (from a final savings rate zT = 0) So: zt = kt+1/kta = a or kt+1 = a kta So: ct = (1 - a) kta

Cass-Koopmans model (3) : Cass-Koopmans model (3) Bellman: V(k0) = max [u(c0)+ V(k1)]. Optimize this expression with respect to k1 We know that c0 + k1 = f(k0) So we get: u’(f(k0) - k1)= V’(k1) Again, this seems normal, but V’(k1) is giving trouble: how do we compute this term? We use the envelope theorem!

Intermezzo: Envelope property : Intermezzo: Envelope property The more restricted V(c)’s are more concave (a greater curvature at an optimum c0) but have the same tangent at the optimum c0. The problem is said to have the envelope property. This property has various handy consequences Interesting is the case where we maximize f(x,c) subject to g(x,c)  0 and keep some of the xi fixed

What do we learn from this? : What do we learn from this? Suppose we are in the optimum. We can maximize either V(c) or V’(c) as long as c = c0! This is the envelope theorem This can be handy in static and dynamic optimization So adding constraints that satisfy the optimum conditions do not change the optimum solution but does change the maximum value function

Return to the Cass-Koopmans model (4) : Return to the Cass-Koopmans model (4) V(k0) = u(f(k0) - k1) + V(k1) Use the solution: k1 = g(k0): envelope: an optimal k0 delivers an optimal k1 V(k0) = u(f(k0) - g(k0)) + V(g(k0)) Differentiate with respect to k0: V’(k0) = u’(.)(f’(k0) – g’(k0)) + V’(.)g’(k0) This is called the Benveniste-Scheinkman formula Use: u’(.) = u’(f(k0) - k1) = V’(k1) to get: V’(k0) = u’(.)f’(k0) and so: V’(k1) = u’(f(k1) - k2)f’(k1) and this was the mysterious term we had to describe: u’(f(k0) - k1)=  u’(f(k1) - k2)f’(k1)

Dynamic programming: solving the difference equation : Dynamic programming: solving the difference equation We use again the example in the Cass-Koopmans model: yt = kt and u(ct) = log(ct). Optimization (where we use the restriction yt = ct + kt+1 with Lagrange parameter t) leads to: 1/ct = t and V’(kt+1) = t = 1/ct Using the envelope theorem: write the variables as a function of the state variable: ct = c(kt), kt+1 = k(kt)

DP: using the envelope theorem : DP: using the envelope theorem Bellman: V(kt) = log(c(kt)) + V(k(kt)) + t(kt) (kt - c(kt) - k(kt)) Optimize with respect to kt gives: V ’(kt) = c’t /ct + V’(kt+1)k’t+1 + ’t(kt) (kt – c(kt) - k(kt))+ t(kt) (kt-1 – c’t – k’t+1) The third term vanishes (via the constraint): V ’(kt) = c’t /ct + V’(kt+1)k’t+1 + t(kt) (kt-1 – c’t – k’t+1) Regroup into: V ’(kt) = c’t [1/ct - t(kt)] + [V’(kt+1) - t(kt) ] k’t+1 + t(kt) kt-1 The first-order constraints hold, so the first two terms drop out! V ’(kt) = t(kt) kt-1

DP : DP V’(kt) = t kt  -1 = (1/ct) kt  -1 So the solution is: V’(kt+1) = t , 1/ct =  (1/ct+1) kt+1 -1 How to solve this expression? (1) By iterative substitution, (2) By conjecture

DP: iterative substitution : DP: iterative substitution 1/ct =   (1/ct+1) kt+1  -1 ct + kt+1 = kt , combine these two into: kt /ct = 1+   (kt+1 /ct+1) and repeat the substitution to get: kt /ct = 1 +   + ()2 +… = 1/(1 -  ) So ct = (1-  )kt

DP: conjecture : DP: conjecture Guess kt+1= jkt, or ct = (1 - j)kt  This implies kt+1/ct = j / (1 - j) In the FOC:  (kt+1 /ct+1) = j / (1 - j) or:  (kt+1  / (1 - j)kt+1 ) = j / (1 - j) So j =  and ct = (1 - )kt 

Example: Money in the utility function : Example: Money in the utility function We start from a utility function U = u(ct, Mt/PtNt) and a representative household that maximizes: W =  t u(ct,mt), where 0 <  < 1 is a subjective discount rate Ct is consumption, Mt is nominal money, Pt is the price level, Nt is population The budget constraint: Ct + Kt + Mt/Pt + Bt/Pt = Yt + tNt + (1 - )Kt-1 + Mt-1/Pt + (1 + it-1)Bt-1/Pt where  is the rate of transfers in the form of new money, Bt denotes bonds, and it the interest rate on bonds, Kt is capital, Yt is output

Sidrauski model (1) : Sidrauski model (1) Rewrite the budget constraint in per capita terms and use yt = f(kt-1/(1 + n)) and define initial resources at time t by: t= f(kt-1/(1 + n))+t+ [(1- )/(1 + n)] kt-1+ [mt-1+ (1+it-1)bt-1]/(1 + )(1 + n) = ct+ kt + mt + bt Write the problem as a dynamic programming problem: V(t) = max {u(ct,mt) + V(t+1)}

Sidrauski model (2) : Sidrauski model (2) Trick 1: t+1= f(kt /(1+n)) + t+1 + [(1 - )/(1 + n)] kt + [mt + (1+ it)bt] / (1 + )(1 + n) Trick 2: express kt = t - ct - bt - mt in: V(t) = max {u(ct,mt) + V(t+1)} and optimize with respect to consumption ct, bonds bt, money mt This is now an unconstrained problem

Sidrauski model (3) : Sidrauski model (3) Optimality conditions are: ct: uc(ct,mt) - /(1+n) [fk(kt)+1-]V(t+1) = 0 bt: {(1+it)/(1+t+1)(1+n)}- [(fk(kt)+1-)/(1+n)] = 0 mt: um(ct,mt)-  [(fk(kt)+1-)/(1+n)] V(t+1) +  V(t+1)/((1+t+1)(1+n)) = 0 and the transversality conditions: lim t t xt = 0 for x = k,b,m where t= uc(ct,mt)

Sidrauski model (4) : Sidrauski model (4) Use the envelope theorem: uc(c,m) = V() Condition 1: marginal utility of consumption is equal to the marginal benefit of one additional unit of money. What is the marginal benefit from money? First, it gives direct utility. Second, it increases the resources for the next period Condition 2: this is the Fisher relationship between the nominal rate on bonds, the real rate, and expected inflation Condition 3: the marginal return on holding capital must equal the marginal utility of consumption

Sidrauski model: steady state (1) : Sidrauski model: steady state (1) We can analyze the short-run dynamics (not in this course) and the steady state In the steady state we have n = 0, ss See the notes to see that the steady state can be described by three conditions

Sidrauski model: steady state (2) : Sidrauski model: steady state (2) fk(kss) = 1/ - (1 - ): this defines the steady state amount of capital: independent of the form of the utility function and monetary variables  = : Inflation is fully determined by the rate of money growth css = f(kss) - kss There is superneutrality: no effect of inflation on all real quantities

Sidrauski model: properties : Sidrauski model: properties uc(ct+1,mt+1)/uc(ct,mt) = 1//(1+rt), where rt = fk - . So for low k < kss, the marginal productivity is relatively high and the marginal utility of consumption will decline i = (1 + r)(1 + ) - 1. In the steady state iss = (1 + ss)/  - 1 We have: um(c,m)/uc(c,m) = i/(1+i)

Sidrauski um(c,m)/uc(c,m) = i / (1 + i) : Sidrauski um(c,m)/uc(c,m) = i / (1 + i) I = i / (1 + i) is the relative price of real money balances in terms of consumption goods Buy one unit of money less and invest in bonds: get a yield i, which real value is i / (1 + ), discounted: i / (1 + )(1 + r) = i / (1 + i) We can use this condition to derive a simple money demand equation: u(c,m) = ca mb gives m = b/a c I-1

Copyrights © 2009 authorGEN. All rights reserved.