Stochastic Comparative Statics in Markov Decision Processes
Bar Light

TL;DR
This paper introduces stochastic comparative statics in Markov decision processes, analyzing how the distribution of future optimal decisions responds to parameter changes in multi-period stochastic optimization.
Contribution
It develops a framework for stochastic comparative statics, providing new results on how optimal decisions vary with parameters in Markov decision processes.
Findings
Optimal decisions change predictably with payoff and transition parameters.
Results apply to economic models like investment and pricing.
Provides insights into stationary distribution comparisons.
Abstract
In multi-period stochastic optimization problems, the future optimal decision is a random variable whose distribution depends on the parameters of the optimization problem. We analyze how the expected value of this random variable changes as a function of the dynamic optimization parameters in the context of Markov decision processes. We call this analysis \emph{stochastic comparative statics}. We derive both \emph{comparative statics} results and \emph{stochastic comparative statics} results showing how the current and future optimal decisions change in response to changes in the single-period payoff function, the discount factor, the initial state of the system, and the transition probability function. We apply our results to various models from the economics and operations research literature, including investment theory, dynamic pricing models, controlled random walks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Stochastic Comparative Statics in Markov Decision Processes
Bar Light111Graduate School of Business, Stanford University, Stanford, CA 94305, USA. e-mail: [email protected]
Abstract:
In multi-period stochastic optimization problems, the future optimal decision is a random variable whose distribution depends on the parameters of the optimization problem. We analyze how the expected value of this random variable changes as a function of the dynamic optimization parameters in the context of Markov decision processes. We call this analysis stochastic comparative statics. We derive both comparative statics results and stochastic comparative statics results showing how the current and future optimal decisions change in response to changes in the single-period payoff function, the discount factor, the initial state of the system, and the transition probability function. We apply our results to various models from the economics and operations research literature, including investment theory, dynamic pricing models, controlled random walks, and comparisons of stationary distributions.
Keywords: Markov decision processes, comparative statics, stochastic comparative statics.
MSC2000 subject classification: 90C40
OR/MS subject classification: Primary: Dynamic programming/optimal control
1 Introduction
A question of interest in a wide range of problems in economics and operations research is whether the solution to an optimization problem is monotone with respect to its parameters. The analysis of this question is called comparative statics.222 See Topkis (2011) for a comprehensive treatment of comparative statics methods. Following Topkis’ seminal work (Topkis, 1978), comparative statics methods have received significant attention in the economics and operations research literature.333 See for example LiCalzi and Veinott (1992), Milgrom and Shannon (1994), Athey (2002), Echenique (2002), Antoniadou (2007), Quah (2007), Quah and Strulovici (2009), Shirai (2013), Nocetti (2015), Wang and Li (2015), Barthel and Sabarwal (2018), and Koch (2019). While comparative statics methods are usually applied to static optimization problems, they can also be applied to dynamic optimization problems. In particular, these methods can be used to study how the policy function444Müller (1997) and Smith and McCardle (2002) study how the optimal value function changes with respect to the parameters of the dynamic optimization problem, such as the single-period payoff function and the transition probability function. In contrast, in this paper, we analyze the optimal policy function. changes with respect to the current state of the system or with respect to other parameters of the dynamic optimization problem.555 For comparative statics results in dynamic optimization models see Serfozo (1976), Lovejoy (1987), Amir et al. (1991), Hopenhayn and Prescott (1992), Mirman et al. (2008), Topkis (2011), Krishnamurthy (2016), Smith and Ulu (2017), Lehrer and Light (2018), and Dziewulski and Quah (2019). That is, for multi-period optimization models, comparative statics methods can be used to determine how the current period’s optimal decision changes with respect to the parameters of the optimization problem. For example, in a Markov decision process, under suitable conditions on the payoff function and on the transition function, comparative statics methods can be applied to show that the optimal decision is increasing in the discount factor when the state of the system is fixed. But since the model is dynamic and includes uncertainty, the states’ evolution is different under different discount factors, and thus, it is not clear whether the future optimal decision is increasing in the discount factor even when the current optimal decision is increasing in the discount factor for a fixed state.
The state of the system in period is a random variable from the point of view of period , and thus, the optimal decision in period , which depends on the state of the system in period , is a random variable given the information available in period . In this paper, we analyze how the expected value of the optimal decision in period changes as a function of the optimization problem parameters in the context of Markov decision processes (MDP). We call this analysis stochastic comparative statics. More precisely, let be a partially ordered set that contains some parameters of the MDP. For example, can be the set of all transition probability functions, the set of all discount factors, and/or a set of parameters that influence the payoff function. Suppose that under the parameters a stationary policy function is given by where is the state of the system. Given the policy function and the system’s initial state, the system’s states follow a stochastic process. Suppose that the states’ distribution in period is described by the probability measure . We are interested in finding conditions that ensure that the expected decision in period , is increasing in the parameters on .
The expected value is interpreted in two different ways. From a probabilistic point of view, is the expected optimal decision in period as a function of the parameters . For example, in investment theory, this expected value usually represents the expected capital accumulation in the system in period (Stokey and Lucas, 1989). In inventory management, it represents the expected inventory in period (Krishnan and Winter, 2010), and in income fluctuation problems it represents the expected wealth accumulation (see Huggett (2004) and Bommier and Grand (2018)) in period . From a deterministic point of view, if we consider a population of ex-ante identical agents whose states evolve independently according to the stochastic process that governs the states’ dynamics, then represents the empirical distribution of states in period . In this case, corresponds to the average decision in period of this population given the parameters . This latter interpretation is common in the growing literature on stationary equilibrium models and mean field equilibrium models. In this literature, while the focus is on the analysis of equilibrium, some stochastic comparative statics results have been obtained (see Adlakha and Johari (2013) and Acemoglu and Jensen (2015)). These stochastic comparative statics results are useful in analyzing the equilibrium of these models. In particular, proving comparative statics results and establishing the uniqueness of an equilibrium (see Hopenhayn (1992), Light (2018b), Acemoglu and Jensen (2018), and Light and Weintraub (2019)).
The goal of this paper is to provide general stochastic comparative statics results in the context of an MDP. In particular, we provide various sufficient conditions on the primitives of MDPs that guarantee stochastic comparative statics results with respect to important parameters of MDPs, such as the discount factor, the single-period payoff function, and the transition probability function. We also provide novel comparative statics results with respect to these parameters. For example, we show that under a standard set of conditions that implies that the policy function is increasing in the state, the policy function is increasing the discount factor also (see Section 3.2). We apply our results in capital accumulation models with adjustment costs (Hopenhayn and Prescott, 1992), in dynamic pricing models with reference effects (Popescu and Wu, 2007), and in controlled random walks. As an example, consider the following controlled random walk where is the state of the system in period , is the action chosen in period , and are random variables that are independent and identically distributed across time. In each period, a decision maker receives a reward that depends on the current state of the system and incurs a cost that depends on the action that the decision maker chooses in that period. The reward function is increasing in the state of the system and the cost function is increasing in the decision maker’s action. The decision maker’s goal is to maximize the expected sum of rewards. We provide sufficient conditions on the reward function and on the cost function that guarantee that the decision maker’s current action and the expected future actions increase when the distribution of the random noise is higher in the sense of stochastic dominance. Since our results are intuitive and the sufficient conditions that we provide in order to derive stochastic comparative statics results are satisfied in some dynamic programs of interest, we believe that our results hold in other applications as well.
The rest of the paper is organized as follows. Section 2 presents the dynamic optimization model. Section 2.1 presents definitions and notations that are used throughout the paper. In Section 3.1 we present our main stochastic comparative statics results. In Section 3.2 we study changes in the discount factor and in the single-period payoff function. In Section 3.3 we study changes in the transition probability function. In Section 4 we apply our results to various models. In Section 5 we provide a summary, followed by an Appendix containing proofs.
2 The model
In this section we present the main components and assumptions of the model. For concreteness, we focus on a standard discounted dynamic programming model, sometimes called a Markov decision process.666 All our results can be applied to other dynamic programming models, such as positive dynamic programming and negative dynamic programming. For a comprehensive treatment of dynamic programming models, see Feinberg and Shwartz (2012) and Puterman (2014).
We define a discounted dynamic programming model in terms of a tuple of elements . is a Borel set called the state space. is the Borel -algebra on . is the action space. is a measurable subset of . For all , the non-empty and measurable -section of is the set of feasible actions in state . is a transition probability function. That is, is a probability measure on for each and is a measurable function for each . is a measurable single-period payoff function. is the discount factor.
There is an infinite number of periods . The process starts at some state . Suppose that at time the state is . Based on , the decision maker (DM) chooses an action and receives a payoff . The probability that the next period’s state will lie in is given by .
Let and . A policy is a sequence of Borel measurable functions such that for all and all . For each initial state , a policy and a transition probability function induce a probability measure over the space of all infinite histories .777 The probability measure on the space of all infinite histories is uniquely defined by the Ionescu Tulcea theorem (for more details, see Bertsekas and Shreve (1978) and Feinberg (1996)). We denote the expectation with respect to that probability measure by , and the associated stochastic process by . The DM’s goal is to find a policy that maximizes his expected discounted payoff. When the DM follows a strategy and the initial state is his expected discounted payoff is given by
[TABLE]
Define
[TABLE]
We call the value function.
Define the operator where is the space of all functions by
[TABLE]
where
[TABLE]
Under standard assumptions on the primitives of the MDP,888The state and action spaces can be continuous or discrete. When we discuss convex functions on we assume that is a convex set. standard dynamic programming arguments show that the value function is the unique function that satisfies . In addition, there exists an optimal stationary policy and the optimal policies correspondence
[TABLE]
is nonempty, compact-valued and upper hemicontinuous. Define . We call the policy function. For the rest of the paper, we assume that the value function is the unique and continuous function that satisfies , converges uniformly to for every , and that the policy function exists.999These conditions are usually satisfied in applications. Conditions that ensure the existence and continuity of the value function and the existence of a stationary policy function are widely studied in the literature. See Hinderer et al. (2016) for a textbook treatment. For recent results, see Feinberg et al. (2016) and references therein.
2.1 Notations and definitions
In this paper we consider a parameterized dynamic program. Let be a partially ordered set that influences the DM’s decisions. We denote a generic element in by . Throughout the paper, we slightly abuse the notations and allow an additional argument in the functions defined above. For instance, the value function of the parameterized dynamic program is denoted by
[TABLE]
Likewise, the policy function is denoted by ; is the single-period payoff function; and is the function associated with the dynamic program problem with parameters , as defined above in Equation (1). For the rest of the paper, we let be the set of all transition functions .
When the DM follows the policy function and the initial state is , the stochastic process is a Markov process. The transition function of can be described by the policy function and by the transition function as follows: For all , define if and [math] otherwise, and . is the probability that the second period’s state will lie in . For , define for all . Then is the probability that will lie in in period when the initial state is and the DM follows the policy function . For notational convenience, we omit the reference to the initial state. All the results in this paper hold for every initial state .
We write to denote the probability that will lie in in period , when are the parameters that influence the DM’s decisions and the DM follows the policy function , . For , define
[TABLE]
As we discussed in the introduction, can be interpreted in two ways. According to the first interpretation, the DM’s optimal decision in period is a random variable from the point of view of period . The expected value is the DM’s expected decision in period , given that the parameters that influence the DM’s decisions are . Alternately, the expected value can be interpreted as the aggregate of the decisions of a continuum of DMs facing idiosyncratic shocks. In the latter interpretation, each DM has an individual state and is the distribution of the DMs over the states in period . This interpretation is often used in stationary equilibrium models and in mean field equilibrium models (see more details in Section 4.4). We are interested in the following stochastic comparative statics question: is it true that implies for all (and for each initial state)? We note that for , the stochastic comparative statics question reduces to a comparative statics question: is it true that implies ?
We now introduce some notations and definitions that will be used in the next sections.
For two elements we write if for each . We say that is increasing if implies .
Let where is the set of all functions from to . When and are probability measures on , we write if
[TABLE]
for all Borel measurable functions such that the integrals exist.
In this paper we will focus on two important stochastic orders: the first order stochastic dominance and the convex stochastic order. When is the set of all increasing functions on , we write and say that first order stochastically dominates . If is the set of all convex functions on , we write and say that dominates in the convex stochastic order. If is the set of all increasing and convex functions on , we write . Similarly, for , we write if
[TABLE]
for all Borel measurable functions and all such that the integrals exist.101010 In the rest of the paper, all functions are assumed to be integrable. If is the set of all increasing functions, convex functions, and convex and increasing functions, we write , , and , respectively. For comprehensive coverage of stochastic orders and their applications, see Müller and Stoyan (2002) and Shaked and Shanthikumar (2007).
Definition 1
(i) We say that is monotone if for every increasing function the function is increasing in .
(ii) We say that is convexity-preserving if for every convex function the function is convex in .
(iii) Define . Let . We say that is -preserving if implies that . If is the set of all increasing functions, convex functions, and convex and increasing functions, we say that is -preserving, -preserving, and -preserving, respectively.
3 Main results
In this section we derive our main results. In Section 3.1 we provide stochastic comparative statics results. In Section 3.2 and in Section 3.3 we provide conditions on the primitives of the MDP that guarantee comparative statics and stochastic comparative statics results.
3.1 Stochastic comparative statics
In this section we provide conditions that ensure stochastic comparative statics. Our approach is to find conditions that imply that the states’ dynamics generated under stochastically dominate the states’ dynamics generated under whenever . Theorem 1 shows that if is -preserving and for all , then for all . A proof of Theorem 1 can be found in Chapter 5 in Müller and Stoyan (2002) where the authors study stochastic comparisons of general Markov chains. For completeness, because our setting is slightly different, we provide the proof of Theorem 1 in the Appendix for completeness.111111A similar result to Theorem 1 for the case of and can be found in Huggett (2004), Adlakha and Johari (2013), Balbus et al. (2014), and Acemoglu and Jensen (2015).
The focus of the rest of the paper is on finding sufficient conditions on the primitives of the MDP in order to apply Theorem 1. Corollary 1 and Theorem 2 provide sufficient conditions for to be -preserving and when is the set of increasing functions or the set of increasing and convex functions. The results in this section require conditions on the policy function and on the primitives of the MDP. In Sections 3.2 and 3.3, we provide comparative statics and stochastic comparative statics results that depend only on the primitives of the model (e.g., the transition probabilities and the single-period payoff function).
Theorem 1
Let be a partially ordered set and let . Let and suppose that . Assume that is -preserving and that for all . Then for all .
In the case that and is a partially ordered set that influences the agent’s decisions, Theorem 1 yields a simple stochastic comparative statics result. Corollary 1 shows that if is increasing in , is increasing in , and is monotone, then whenever . This result is useful when is the set of all possible discount factors between [math] and , or is a set that includes parameters that influence the single-period payoff function (see Section 3.2).
Corollary 1
Let and suppose that . Assume that is increasing in for all , is increasing in , , and is monotone. Then
[TABLE]
for all and for each initial state .
In some dynamic programs we are interested in knowing how a change in the initial state will influence the DM’s decisions in future periods. Corollary 2 shows that a higher initial state leads to higher expected decisions if the policy function is increasing in the state of the system and the transition probability function is monotone. The proof follows from the same arguments as those in the proof of Corollary 1. Recall that we denote the initial state by .
Corollary 2
Consider two MDPs that are equivalent except for the initial states , . Assume that , is increasing in , and is monotone. Then for all .
We now derive stochastic comparative statics results with respect to the transition probability function that governs the states’ dynamics. Part (i) of Theorem 2 provides conditions that ensure that implies for all . Part (ii) provides conditions that ensure that implies for all . In Section 4 we apply these results to various commonly studied dynamic optimization models.
Theorem 2
Let .
(i) Assume that is monotone, is increasing in , and for all . Then implies that for all .
(ii) Assume that is monotone and convexity-preserving, is increasing and convex in , and for all . Then implies that for all .
3.2 A change in the discount factor or in the payoff function
In this section we provide sufficient conditions for the monotonicity of the policy function in the state variable, and for the monotonicity of the policy function in other parameters of the MDP, including the discount factor and the parameters that influence the single-period payoff function. Our stochastic comparative statics results in Section 3.1 rely on these monotonicity properties. Thus, we provide conditions on the model’s primitives that ensure stochastic comparative statics results.
The monotonicity of the policy function in the state variable follows from the conditions on the model’s primitives provided in Topkis (2011). We note that these conditions are not necessary for deriving monotonicity results regarding the policy function, and in some specific applications one can still derive these monotonicity results using different techniques or under different assumptions.121212 For example, see Lovejoy (1987) and Hopenhayn and Prescott (1992). See also Smith and McCardle (2002) for conditions that guarantee that the value function is monotone and has increasing differences.
Recall that a function is said to have increasing differences in on if for all and such that and , we have
[TABLE]
A function has decreasing differences if has increasing differences.
A set is called an upper set if and imply . The transition probability has stochastically increasing differences if has increasing differences for every upper set . See Topkis (2011) for examples of transition probabilities that have stochastically increasing differences. The optimal policy correspondence is said to be ascending if , , and imply and . In particular, if is ascending, then and are increasing functions. Topkis (2011) provides conditions under which the optimal policy correspondence is ascending. These conditions are summarized in the following assumption:
Assumption 1
(i) is increasing in and has increasing differences.
(ii) is monotone and has stochastically increasing differences.
(iii) For all , implies .
Theorem 3 shows that under Assumption 1, the policy function is increasing in the discount factor. Furthermore, if the single period payoff function depends on some parameter and has increasing differences, then the policy function is increasing in the parameter .
Theorem 3
Suppose that Assumption 1 holds and that is ascending.
(i) Let . Then for all and for all .
(ii) Let be a parameter that influences the payoff function. If the payoff function has increasing differences in and in , then for all , and for all whenever .
3.3 A change in the transition probability function
In this section we study stochastic comparative statics results related to a change in the transition function. We provide conditions on the transition function and on the payoff function that ensure that implies comparative statics results and stochastic comparative statics results. We assume that the transition function is given by for all , where is a random variable with law and support . Theorem 4 provides conditions on the function that imply that the policy function is higher when is higher in the sense of stochastic dominance. In Section 4.3, we provide an example of a controlled random walk where the conditions on are satisfied.
Theorem 4
Suppose that where is convex, increasing, continuous, and has increasing differences in , and ; and has the law , . is convex and increasing in and has increasing differences. For all , we have .
If then
(i) for all and is increasing in .
(ii) for all .
4 Applications
In this section we apply our results to several dynamic optimization models from the economics and operations research literature.
4.1 Capital accumulation with adjustment costs
Capital accumulation models are widely studied in the investment theory literature (Stokey and Lucas, 1989). We consider a standard capital accumulation model with adjustment costs (Hopenhayn and Prescott, 1992). In this model, a firm maximizes its expected discounted profit over an infinite horizon. The single-period revenues depend on the demand and on the firm’s capital. The demand evolves exogenously in a Markovian fashion. In each period, the firm decides on the next period’s capital level and incurs an adjustment cost that depends on the current capital level and on the next period’s capital level. Using the stochastic comparative statics results developed in the previous section, we find conditions that ensure that higher future demand, in the sense of first order stochastic dominance, increases the expected long run capital accumulated. We provide the details below.
Consider a firm that maximizes its expected discounted profit. The firm’s single-period payoff function is given by
[TABLE]
where . The revenue function depends on an exogenous demand shock , and on the current firm’s capital stock . The state space is given by . The demand shocks follow a Markov process with a transition function . The firm chooses the next period’s capital stock and incurs an adjustment cost of . The transition probability function is given by
[TABLE]
where , is a measurable set in , is a measurable set in , and is a Markov kernel on .
It is easy to see that if is monotone then is monotone and that implies .
Assume that the revenue function is continuous and has increasing differences, that is continuous and has decreasing differences, and that is ascending. Under these conditions, Hopenhayn and Prescott (1992) show that the policy function is increasing in if is monotone. If, in addition, , then for all (see Corollary 7 in Hopenhayn and Prescott (1992)). Thus, part (i) in Theorem 2 implies that for all .
Proposition 1
Let and be two Markov kernels on . Assume that is continuous and has increasing differences, is continuous and has decreasing differences, is ascending, and whenever . Assume that is monotone and that . Then under the expected capital accumulation is higher than under , i.e., for all .
4.2 Dynamic pricing with a reference effect and an uncertain memory factor
In this section we consider a dynamic pricing model with a reference effect as in Popescu and Wu (2007). In this model the demand is sensitive to the firm’s pricing history. In particular, consumers form a reference price that influences their demand. As in Popescu and Wu (2007), we consider a profit-maximizing monopolist who faces a homogeneous stream of repeated customers over an infinite time horizon. In each period, the monopolist decides on a price to charge the consumers. Assume for simplicity that the marginal cost is [math]. The resulting single-period payoff function is given by
[TABLE]
where is the current reference price and is the demand function that depends on the reference price and on the price that the monopoly charges . We assume that the function is continuous, non-negative, decreasing in , increasing in , has increasing differences, and is convex in . If the current reference price is and the firm sets a price of then the next period’s reference price is given by (see Popescu and Wu (2007) for details on the micro foundations of this structure). is called the memory factor. In contrast to the model of Popescu and Wu (2007), we assume that the memory factor is not deterministic. More precisely, we assume that the memory factor is a random variable on with law . So the transition probability function is given by
[TABLE]
for all . We show that even when the memory factor is a random variable, the result of Popescu and Wu (2007) holds in expectation, i.e., the long run expected prices are increasing in the current reference price. We also show that an increase in the discount factor increases the current optimal price and the long run expected prices.
Proposition 2
Suppose that the function is continuous, non-negative, decreasing in , increasing and convex in , and has increasing differences.
(i) The optimal pricing policy is increasing in the reference price .
(ii) The expected optimal prices in each period are higher when the initial reference price is higher.
(iii) implies that for all and for all .
4.3 Controlled random walks
Controlled random walks are used to study controlled queueing systems and other phenomena in applied probability (for example, see Serfozo (1981)). In this section we consider a simple controlled random walk on . At any period, the state of the system determines the current period’s reward . The next period’s state is given by where is a random variable with law and support , and is the action that the DM chooses. Thus, the process evolves as a random walk plus the DM’s action . When the DM chooses an action , a cost of is incurred. We assume that is a compact set, is an increasing and convex function, and is an increasing function. That is, the reward and the marginal reward are increasing in the state of the system and the costs are increasing in the action that the DM chooses.
The single-period payoff function is given by and the transition probability function is given by
[TABLE]
for all . In this setting, when choosing an action , the DM faces the following trade-off between the current payoff and future payoffs: while choosing a higher action has higher current costs, it increases the probability that the state of the system will be higher in the next period, and thus, a higher action increases the probability of higher future rewards.
We study how a change in the random variable affects the DM’s current and future optimal decisions. When is convex and increasing in , it is easy to see that the transition function and the single-period function satisfy the conditions of Theorem 4. Thus, the proof of the following proposition follows immediately from Theorem 4.
Proposition 3
Suppose that where has the law , . Suppose that is convex and increasing in . Assume that .
Then for all , is increasing in , and for all .
4.4 Comparisons of stationary distributions
Stationary equilibrium is the preferred solution concept for many models that describe large dynamic economies (see Acemoglu and Jensen (2015) for examples of such models). In these models, there is a continuum of agents. Each agent has an individual state and solves a discounted dynamic programming problem given some parameters (usually prices). The parameters are determined by the aggregate decisions of all agents. Informally, a stationary equilibrium of these models consists of a set of parameters , a policy function , and a probability measure on such that (i) is an optimal stationary policy given the parameters , (ii) is a stationary distribution of the states’ dynamics given the parameters , and (iii) the parameters are determined as a function of and .131313 Stationary equilibrium models are used to study a wide range of economic phenomena. Examples include models of industry equilibrium (Hopenhayn, 1992), heterogeneous agent macro models (Huggett, 1993) and (Aiyagari, 1994), and many more.
The existence and uniqueness of a stationary probability measure on in the sense that
[TABLE]
for all are widely studied.141414 For example, see Hopenhayn and Prescott (1992), Kamihigashi and Stachurski (2014), and Foss et al. (2018). We now derive comparative statics results relating to how the stationary distribution changes when the transition function changes. We denote the least stationary distribution by and the greatest stationary distribution by .
Proposition 4
Suppose that is a compact set in .
(i) Let be the set of all monotone transition probability functions . Assume that is increasing in on where is endowed with the order . Then the greatest stationary distribution and the least stationary distributions are increasing in on with respect to .151515 The existence of the greatest fixed point is guaranteed from the Tarski fixed-point theorem. For more details, see the Appendix and Topkis (2011).
(ii) Let be the set of all monotone and convexity-preserving transition probability functions . Assume that is convex in and is increasing in on where is endowed with the order . Then the greatest stationary distribution and the least stationary distributions are increasing in on with respect to .
We apply Proposition 4 to a standard stationary equilibrium model (Huggett, 1993).
There is a continuum of ex-ante identical agents with mass . The agents solve a consumption-savings problem when their income is fluctuating. Each agent’s payoff function is given by where denotes the agent’s current wealth, denotes the agent’s savings, is the agent’s current consumption, and is the agent’s utility function. Thus, when an agent consumes , his single-period payoff is given by .161616For simplicity we assume that all the agents are ex-ante identical, i.e., the agents have the same utility function and transition function. The model can be extended to the case of ex-ante heterogeneity. Recall that a utility function is in the class of hyperbolic absolute risk aversion (HARA) utility functions if its absolute risk aversion is hyperbolic. That is, if for . We assume that is in the HARA class and that the utility function’s derivative is convex.
Savings are limited to a single risk-free bond. When the agents save an amount their next period’s wealth is given by where is the risk-free bond’s rate of return and is the agents’ labor income in the next period. The agents’ labor income is a random variable with law . Thus, the transition function is given by
[TABLE]
The set from which the agents can choose their savings level is given by where is a borrowing limit and is an upper bound on savings.
A stationary equilibrium is given by a probability measure on , a rate of return , and a stationary savings policy function such that (i) is optimal given , (ii) is a stationary distribution given , i.e., , and (iii) markets clear in the sense that the total supply of savings equals the total demand for savings, i.e., .
If the agents’ utility function is in the HARA class then the savings policy function is convex and increasing (see Jensen (2017)). It is easy to see that is convexity-preserving and monotone. Furthermore, when is convex then the policy function is increasing in with respect to the convex order, i.e., whenever (see Light (2018a)). Thus, part (ii) of Proposition 4 implies that when the labor income uncertainty increases (i.e., ), both the highest partial equilibrium (when is fixed) wealth inequality and the lowest partial equilibrium wealth inequality increase (i.e., ).
5 Summary
This paper studies how the current and future optimal decisions change as a function of the optimization problem’s parameters in the context of Markov decision processes. We provide simple sufficient conditions on the primitives of Markov decision processes that ensure comparative statics results and stochastic comparative statics results. We show that various models from different areas of operations research and economics satisfy our sufficient conditions.
6 Appendix
6.1 Proofs of the results in Section 3.1
Proof of Theorem 1. For the result is trivial since . Assume that for some . First note that for every measurable function and we have
[TABLE]
To see this, assume first that where and is the indicator function of the set . We have
[TABLE]
A standard argument shows that equality (2) holds for every measurable function .
Now assume that . We have
[TABLE]
The first inequality follows since , is -preserving and . The second inequality follows since . Thus, . We conclude that for all .
Proof of Corollary 1. We show that is -preserving and that for all . Let be an increasing function and let .
Since is monotone and is increasing in , if then
[TABLE]
Thus, is -preserving.
Let . Since and is monotone, we have
[TABLE]
Thus, .
From Theorem 1 we conclude that for all . We have
[TABLE]
which proves the Corollary.
Proof of Theorem 2. (i) Assume that . We show that is -preserving and that for all . Let be an increasing function.
Assume that . Since and is monotone we have
[TABLE]
which proves that is -preserving.
Let . Since is monotone, for all , and we have
[TABLE]
which proves that for all .
From Theorem 1 we conclude that for all . Since is increasing, we have
[TABLE]
which proves part (i).
(ii) Assume that . We show that is -preserving and that for all .
Let be an increasing and convex function. Let and for . We have
[TABLE]
The first inequality follows since is convexity-preserving. The second inequality follows since is convex and is monotone. Thus, is convex. Part (i) shows that is increasing. We conclude that is -preserving.
Fix . We have
[TABLE]
The first inequality follows since and is monotone. The second inequality follows since . We conclude that .
From Theorem 1 we conclude that for all . Since is increasing and convex, we have
[TABLE]
which proves part (ii).
6.2 Proofs of the results in Section 3.2
In order to prove Theorem 3 we need the following two results:
Proposition 5
Suppose that Assumption 1 holds. Then
(i) has increasing differences whenever is an increasing function.
(ii) is ascending. In particular, is an increasing function.
(iii) is an increasing function whenever is an increasing function. is an increasing function.
Proof. See Theorem 3.9.2 in Topkis (2011).
Proposition 6
Let be a partially ordered set. Assume that is ascending. If has increasing differences in , , and , then
[TABLE]
has increasing differences in .
Proof. See Lemma 1 in Hopenhayn and Prescott (1992) or Lemma 2 in Lovejoy (1987).
Proof of Theorem 3. (i) Let be the set of all possible discount factors, endowed with the standard order: if is greater than or equal to . Assume that . Let and assume that has increasing differences in and is increasing in . Let . Since has increasing differences, the function is increasing in . Since is monotone we have
[TABLE]
Rearranging the last inequality yields
[TABLE]
Since is increasing in and is monotone, the right-hand-side and the left-hand-side of the last inequality are nonnegative. Thus, multiplying the left-hand-side of the last inequality by and the right-hand-side of the last inequality by preserves the inequality. Adding to each side of the last inequality yields
[TABLE]
That is, has increasing differences in . An analogous argument shows that has increasing differences in . Proposition 5 guarantees that has increasing differences in and that is increasing in .
Proposition 6 implies that has increasing differences. We conclude that for all , has increasing differences and is increasing in . From standard dynamic programming arguments, converges uniformly to . Since the set of functions that has increasing differences and is increasing in is closed under uniform convergence, has increasing differences and is increasing in . From the same argument as above, has increasing differences in . Theorem 6.1 in Topkis (1978) implies that is increasing in for all . Proposition 5 implies that is increasing in for all . We now apply Corollary 1 to conclude that for all .
(ii) The proof is similar to the proof of part (i) and is therefore omitted.
6.3 Proofs of the results in Section 3.3
Proof of Theorem 4. Suppose that the function is convex and increasing in , and has increasing differences where is endowed with the stochastic dominance order . Let .
Note that has increasing differences in , and if and only if is supermodular (see Theorem 3.2 in Topkis (1978)).
From the fact that the composition of a convex and increasing function with a convex, increasing and supermodular function is convex and supermodular (see Topkis (2011)) the function is convex and supermodular in for all . Since convexity and supermodularity are preserved under integration, the function is convex and supermodular in . Thus,
[TABLE]
is convex and supermodular in as the sum of convex and supermodular functions. This implies that is convex. Since is increasing in it follows that is increasing in .
Note that for any increasing function we have
[TABLE]
so .
Fix , and let . Since is supermodular in , the function is increasing in . We have
[TABLE]
The first inequality follows since . The second inequality follows from the facts that is increasing in and has increasing differences. Adding to each side of the last inequality implies that has increasing differences in . Similarly, we can show that has increasing differences in .
Proposition 6 implies that has increasing differences. We conclude that for all , is convex and increasing in and has increasing differences. From standard dynamic programming arguments, converges uniformly to . Since the set of functions that have increasing differences and are convex and increasing in is closed under uniform convergence, has increasing differences and is convex and increasing in . From the same argument as above, has increasing differences in and . An application of Theorem 6.1 in Topkis (1978) implies that for all and is increasing in . The fact that is increasing implies that is monotone. We now apply Corollary 1 to conclude that for all .
6.4 Proofs of the results in Sections 4.2 and 4.4
Proof of Proposition 2. (i) Let be a convex function. The facts that is convex in and that convexity is preserved under integration imply that the function is convex in . Thus, the function given by
[TABLE]
is convex in . A standard dynamic programming argument (see the proof of Proposition 3) shows that the value function is convex. The convexity of implies that for all , the function has increasing differences in . Since increasing differences are preserved under integration, has increasing differences in . Since is nonnegative and has increasing differences, the function has increasing differences. Thus, the function
[TABLE]
has increasing differences as the sum of functions with increasing differences. Now apply Theorem 6.1 in Topkis (1978) to conclude that is increasing.
(ii) Follows from Corollary 1.
(iii) Follows from a similar argument to the arguments in the proof of Theorem 3.
We now introduce some notations and a result that is needed in order to prove Proposition 4. Recall that a partially ordered set is said to be a lattice if for all , and exist in . is a complete lattice if for all non-empty subsets the elements and exist in . We need the following Proposition regarding the comparison of fixed points. For a proof, see Corollary 2.5.2 in Topkis (2011).
Proposition 7
Suppose that is a nonempty complete lattice, is a partially ordered set, and is an increasing function from into . Then the greatest and least fixed points of exist and are increasing in on .
Proof of Proposition 4. Let be the set of all probability measures on . The partially ordered set and the partially ordered set are complete lattices when is compact (see Müller and Scarsini (2006)).
(i) Define the operator by
[TABLE]
The proof of Theorem 2 implies that is an increasing function on with respect to . That is, for and we have whenever and . Proposition 7 implies the result.
(ii) The proof is analogous to the proof of part (i) and is therefore omitted.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Acemoglu and Jensen (2015) Acemoglu, D. and M. K. Jensen (2015): “Robust comparative statics in large dynamic economies,” Journal of Political Economy , 123, 587–640.
- 2Acemoglu and Jensen (2018) ——— (2018): “Equilibrium Analysis in the Behavioral Neoclassical Growth Model,” Working Paper .
- 3Adlakha and Johari (2013) Adlakha, S. and R. Johari (2013): “Mean field equilibrium in dynamic games with strategic complementarities,” Operations Research , 61, 971–989.
- 4Aiyagari (1994) Aiyagari, S. R. (1994): “Uninsured idiosyncratic risk and aggregate saving,” The Quarterly Journal of Economics , 109, 659–684.
- 5Amir et al. (1991) Amir, R., L. J. Mirman, and W. R. Perkins (1991): “One-sector nonclassical optimal growth: optimality conditions and comparative dynamics,” International Economic Review , 625–644.
- 6Antoniadou (2007) Antoniadou, E. (2007): “Comparative statics for the consumer problem,” Economic Theory , 31, 189–203.
- 7Athey (2002) Athey, S. (2002): “Monotone comparative statics under uncertainty,” The Quarterly Journal of Economics , 117, 187–223.
- 8Balbus et al. (2014) Balbus, Ł., K. Reffett, and Ł. Woźny (2014): “A constructive study of Markov equilibria in stochastic games with strategic complementarities,” Journal of Economic Theory , 150, 815–840.
