A Primer in Dynamic Programming : A Primer in Dynamic Programming
Goals : Goals Practical guide to dynamic programming
Intuition, not formal mathematical proofs
Restrictions: (1) use available mathematical skills only, (2) easy examples
Prepare for Walsh’s chapters 2 and 3!
Contents : Contents Intuition in a discrete network
Example: Cake Eating problem
Formal approach
Example: Cass-Koopmans growth model
Envelope theorem
Cass-Koopmans continued
Sidrauski: Money in the Utility Function
Intuition of Dynamic Programming : Intuition of Dynamic Programming Bellman’s principle: an optimal trajectory has the property that, whatever the initial conditions are, the remaining path must constitute an optimum
We will illustrate this principle using the so-called Traveling Salesman problem: find the shortest route between cities
A graph of the Traveling Salesman problem: shortest route 1-7 : A graph of the Traveling Salesman problem: shortest route 1-7 1 1 2 3 4 5 6 7 78 89 102 38 54 45 37 15 12 22 80
Optimality : Optimality Suppose 1-3-6-7 is the optimal route, then 3-6-7 needs to be the optimal route from 3 to 7: 3-6-7 needs to be shorter than 3-5-7
So we can use backward induction
Let Mn(i) be the minimal distance to the final point given that the salesman is in town i and has to make n more decisions
Dij is the distance from town i to j
J(i) is the set of towns the salesman can choose from when in town i (so J(1) = 2,3,4)
Mn(i) = min jєJ(i) {Dij + Mn-1(j)}
Backward induction : Backward induction Step 1: M1(5) = 45, M1(6) = 37
Step 2: M2(2) = min j = 5,6 (D2j + M1(j)) = min(38 + 45, 54 + 37) = 83
and M2(3) = min(80 + 45, 22 + 37) = 59
and M2(4) = min(12 + 45, 15 + 37) = 52
Step 3: M3(1) = min j = 2,3,4 (D1j + M2(j)) = min{78 + 83, 89 + 59, 102 + 52} = 148
So the path is 1-3-6-7!
A Cake-Eating example : A Cake-Eating example Suppose we have a cake of size W1, which does not depreciate
At each moment of time t = 1,2,3,…,T you can eat some cake but must save the rest
Utility remains stationairy u(ct)
Maximize t=1T t u(ct) with a discount factor 0 1
We have non-negativity constraints ct 0 and Wt 0
Law of motion for the cake holds at all periods: Wt+1 = Wt - ct
Cake-Eating: Direct Attack : Cake-Eating: Direct Attack This is called the sequence problem
Rewrite all constraints into: t=1T ct + WT+1 = W1. This reduces the number of non-negativity constraints: only ct 0 for all t = 1,2,3,…,T, and WT+1 0. Let be the multiplier on the resource constraint and on the non-negativity condition on WT+1. We get the first-order conditions:
t-1 u’(ct) = , for t = 1,2,…, T
=
[Note that we ignore the nonnegativity conditions on ct, because we assume that lim u’(c ) if c 0]
Combining gives the necessary condition: u’(ct) = u’(ct+1). This is the Euler condition that holds at all t
Cake-Eating (3) : Cake-Eating (3) The Euler condition is necessary, but not sufficient for an optimal trajectory
Sufficient is that the terminal condition WT+1 = 0 holds (and so binds): = > 0 has to hold
The agent wants WT+1 = - and we need to avoid that
Define a value function VT (W1): the maximum utility flow from a T-period problem given a size W1 cake
A slight increase in the size of the cake leads to an equal increase in marginal utility in each period: V’T (W1) = = t-1 u’(ct)
So it does not matter when the extra cake is eaten given that the consumer is acting optimally
Dynamic programming of the Cake Eating: a finite horizon problem : Dynamic programming of the Cake Eating: a finite horizon problem We add a period 0 and an initial cake size W0. We consider the problem:
Max u(c0) + VT (W1)
Where W1 = W0 - c0 and W0 given
Principle of optimality: VT (W1) fully summarizes optimal behavior (and so maximum utility VT (W1) from t = 1)
First-order condition: u’(c0) = V’T (W1)
We noted before that: V’T (W1) = u’(c1) = t u’(ct+1)
Which gives finally again: u’(ct) = u’(ct+1).
Cake Eating: Infinite Horizon : Cake Eating: Infinite Horizon Max t=1 t u(ct) subject to Wt+1 = Wt - ct
V(W) = Max u(c) + V (W - c) for all W
The size of the cake W is the state variable, consumption is the control variable, W ’ = W - c the transition equation (law of motion)
So the functional equation is:
V(W) = Max u(W - W’) + V (W’)
We specify the problem so that instead of choosing today’s consumption we choose tomorrow’s state!
Cake Eating : Cake Eating First-order condition u’(c) = V ’(W ’) [simply take the derivative of the goal function]
We have shown above that V ’(W) = u’(c) (this holds for all W, so it will hold in the next period as well), so V ’(W ’) = u’(c’) in the following period, which gives again the Euler condition:
u’(c) = u’(c’)
It is useful then to specify a policy function c = k(W) that prescribes how much to consume
Cake Eating: example (1) : Cake Eating: example (1) u(c) = log(c), so guess that the solution is V(W) = A + B log(W) for all W
Are there parameters A and B that satisfy the functional equation?
A + B log(W) = max log(W – W ’) + (A + B log(W’))
u’(c) = 1/c = B/(W - c), so c = W/(1+ B) and W ’ = W – c = BW/(1+ B)
Cake Eating: example (2) : Cake Eating: example (2) Using this in the functional form:
A + B log (W) = max log (W / (1+ B)) + (A + B log (BW / (1 + B)))
Collecting terms for the log(W)’s we get B = 1/(1- ) and we can do the same for A (which is less interesting!)
So, our guess works in the model
Policy: save a constant fraction of the cake and eat the rest: c = W (1 - )
And W ’ = W
Dynamic Programming : Dynamic Programming Easy in discrete time-problems
max tf(x,u) subject to a difference equation dx = g(x,u) and x(0) = x0, x(T) = some terminal condition xT*
Use the so-called Value Function: V(xt) = max {f(x,u) + V(xt+1)} with as a discount factor (widely applicable in economics)
Subject to: xt+1 = g(x,u) with slack parameter
Solve the Bellman Equation : Solve the Bellman Equation Optimality:
fu + guV’(xt+1) = 0
fu = : via the so-called envelope theorem: in order to get an interior solution
The appropriate terminal conditions: e.g. for nonnegative state variables: T [xT* – x(T)] = 0
Example: the Cass-Koopmans optimal growth model : Example: the Cass-Koopmans optimal growth model Max U = t u(ct)
Subject to ct + kt+1= f(kt) and a given k0. Depreciation is complete kt+1 = (1 - )kt + it and = 1
So we have to find the optimal time series of kt, kt+1,… to maximize U
Rewrite U = t u(f(kt) – kt+1)
Cass-Koopmans model (2) : Cass-Koopmans model (2) For one t (e.g. t+1) we get: t u(f(kt) – kt+1) + t+1 u(f(kt+1) – kt+2)
Optimality leads to: u’(ct) = f’(kt+1)u’(ct+1)
A typical Euler equation if we know that: R = f’(kt+1)
Next an example and after that we do it in the dynamic programming format using the Bellman principle
Cass-Koopmans: Example : Cass-Koopmans: Example u(ct) = log(ct) and f(kt) = kta
FOC: u’(ct) = f ’(kt+1) u’(ct+1). So:
1/(kta - kt+1) = [1/(kt+1a - kt+2)] a kt+1a-1
Define the savings rate zt = kt+1/kta and the FOC reads: zt /(1 - zt) = a (1 / (1 - zt+1)). This first-order difference equation is a bit easier to solve
One can show that if the horizon gets to infinity zt goes to a (from a final savings rate zT = 0)
So: zt = kt+1/kta = a or kt+1 = a kta
So: ct = (1 - a) kta
Cass-Koopmans model (3) : Cass-Koopmans model (3) Bellman: V(k0) = max [u(c0)+ V(k1)]. Optimize this expression with respect to k1
We know that c0 + k1 = f(k0)
So we get: u’(f(k0) - k1)= V’(k1)
Again, this seems normal, but V’(k1) is giving trouble: how do we compute this term? We use the envelope theorem!
Intermezzo: Envelope property : Intermezzo: Envelope property The more restricted V(c)’s are more concave (a greater curvature at an optimum c0) but have the same tangent at the optimum c0.
The problem is said to have the envelope property. This property has various handy consequences
Interesting is the case where we maximize f(x,c) subject to g(x,c) 0 and keep some of the xi fixed
What do we learn from this? : What do we learn from this? Suppose we are in the optimum. We can maximize either V(c) or V’(c) as long as c = c0! This is the envelope theorem
This can be handy in static and dynamic optimization
So adding constraints that satisfy the optimum conditions do not change the optimum solution but does change the maximum value function
Return to the Cass-Koopmans model (4) : Return to the Cass-Koopmans model (4) V(k0) = u(f(k0) - k1) + V(k1)
Use the solution: k1 = g(k0): envelope: an optimal k0 delivers an optimal k1
V(k0) = u(f(k0) - g(k0)) + V(g(k0))
Differentiate with respect to k0: V’(k0) = u’(.)(f’(k0) – g’(k0)) + V’(.)g’(k0)
This is called the Benveniste-Scheinkman formula
Use: u’(.) = u’(f(k0) - k1) = V’(k1) to get:
V’(k0) = u’(.)f’(k0) and so: V’(k1) = u’(f(k1) - k2)f’(k1) and this was the mysterious term we had to describe: u’(f(k0) - k1)= u’(f(k1) - k2)f’(k1)
Dynamic programming: solving the difference equation : Dynamic programming: solving the difference equation We use again the example in the Cass-Koopmans model: yt = kt and u(ct) = log(ct). Optimization (where we use the restriction yt = ct + kt+1 with Lagrange parameter t) leads to:
1/ct = t and V’(kt+1) = t = 1/ct
Using the envelope theorem: write the variables as a function of the state variable: ct = c(kt), kt+1 = k(kt)
DP: using the envelope theorem : DP: using the envelope theorem Bellman: V(kt) = log(c(kt)) + V(k(kt)) + t(kt) (kt - c(kt) - k(kt))
Optimize with respect to kt gives:
V ’(kt) = c’t /ct + V’(kt+1)k’t+1 + ’t(kt) (kt – c(kt) - k(kt))+ t(kt) (kt-1 – c’t – k’t+1)
The third term vanishes (via the constraint):
V ’(kt) = c’t /ct + V’(kt+1)k’t+1 + t(kt) (kt-1 – c’t – k’t+1)
Regroup into:
V ’(kt) = c’t [1/ct - t(kt)] + [V’(kt+1) - t(kt) ] k’t+1 + t(kt) kt-1
The first-order constraints hold, so the first two terms drop out!
V ’(kt) = t(kt) kt-1
DP : DP V’(kt) = t kt -1 = (1/ct) kt -1
So the solution is: V’(kt+1) = t ,
1/ct = (1/ct+1) kt+1 -1
How to solve this expression? (1) By iterative substitution, (2) By conjecture
DP: iterative substitution : DP: iterative substitution 1/ct = (1/ct+1) kt+1 -1
ct + kt+1 = kt , combine these two into:
kt /ct = 1+ (kt+1 /ct+1)
and repeat the substitution to get:
kt /ct = 1 + + ()2 +… = 1/(1 - )
So ct = (1- )kt
DP: conjecture : DP: conjecture Guess kt+1= jkt, or ct = (1 - j)kt
This implies kt+1/ct = j / (1 - j)
In the FOC: (kt+1 /ct+1) = j / (1 - j) or:
(kt+1 / (1 - j)kt+1 ) = j / (1 - j)
So j = and ct = (1 - )kt
Example: Money in the utility function : Example: Money in the utility function We start from a utility function U = u(ct, Mt/PtNt) and a representative household that maximizes: W = t u(ct,mt), where 0 < < 1 is a subjective discount rate
Ct is consumption, Mt is nominal money, Pt is the price level, Nt is population
The budget constraint: Ct + Kt + Mt/Pt + Bt/Pt = Yt + tNt + (1 - )Kt-1 + Mt-1/Pt + (1 + it-1)Bt-1/Pt where is the rate of transfers in the form of new money, Bt denotes bonds, and it the interest rate on bonds, Kt is capital, Yt is output
Sidrauski model (1) : Sidrauski model (1) Rewrite the budget constraint in per capita terms and use yt = f(kt-1/(1 + n)) and define initial resources at time t by:
t= f(kt-1/(1 + n))+t+ [(1- )/(1 + n)] kt-1+ [mt-1+ (1+it-1)bt-1]/(1 + )(1 + n) = ct+ kt + mt + bt
Write the problem as a dynamic programming problem: V(t) = max {u(ct,mt) + V(t+1)}
Sidrauski model (2) : Sidrauski model (2) Trick 1: t+1= f(kt /(1+n)) + t+1 + [(1 - )/(1 + n)] kt + [mt + (1+ it)bt] / (1 + )(1 + n)
Trick 2: express kt = t - ct - bt - mt
in: V(t) = max {u(ct,mt) + V(t+1)}
and optimize with respect to consumption ct, bonds bt, money mt
This is now an unconstrained problem
Sidrauski model (3) : Sidrauski model (3) Optimality conditions are:
ct: uc(ct,mt) - /(1+n) [fk(kt)+1-]V(t+1) = 0
bt: {(1+it)/(1+t+1)(1+n)}- [(fk(kt)+1-)/(1+n)] = 0
mt: um(ct,mt)- [(fk(kt)+1-)/(1+n)] V(t+1) + V(t+1)/((1+t+1)(1+n)) = 0
and the transversality conditions:
lim t t xt = 0 for x = k,b,m
where t= uc(ct,mt)
Sidrauski model (4) : Sidrauski model (4) Use the envelope theorem: uc(c,m) = V()
Condition 1: marginal utility of consumption is equal to the marginal benefit of one additional unit of money. What is the marginal benefit from money? First, it gives direct utility. Second, it increases the resources for the next period
Condition 2: this is the Fisher relationship between the nominal rate on bonds, the real rate, and expected inflation
Condition 3: the marginal return on holding capital must equal the marginal utility of consumption
Sidrauski model: steady state (1) : Sidrauski model: steady state (1) We can analyze the short-run dynamics (not in this course) and the steady state
In the steady state we have n = 0, ss
See the notes to see that the steady state can be described by three conditions
Sidrauski model: steady state (2) : Sidrauski model: steady state (2) fk(kss) = 1/ - (1 - ): this defines the steady state amount of capital: independent of the form of the utility function and monetary variables
= : Inflation is fully determined by the rate of money growth
css = f(kss) - kss
There is superneutrality: no effect of inflation on all real quantities
Sidrauski model: properties : Sidrauski model: properties uc(ct+1,mt+1)/uc(ct,mt) = 1//(1+rt), where rt = fk - . So for low k < kss, the marginal productivity is relatively high and the marginal utility of consumption will decline
i = (1 + r)(1 + ) - 1. In the steady state iss = (1 + ss)/ - 1
We have: um(c,m)/uc(c,m) = i/(1+i)
Sidrauskium(c,m)/uc(c,m) = i / (1 + i) : Sidrauski um(c,m)/uc(c,m) = i / (1 + i) I = i / (1 + i) is the relative price of real money balances in terms of consumption goods
Buy one unit of money less and invest in bonds: get a yield i, which real value is i / (1 + ), discounted: i / (1 + )(1 + r) = i / (1 + i)
We can use this condition to derive a simple money demand equation: u(c,m) = ca mb gives m = b/a c I-1