Conditional Optimal Stopping: A Time-Inconsistent Optimization
Marcel Nutz, Yuchong Zhang

TL;DR
This paper introduces a new framework for optimal stopping problems where the decision is conditioned on events like survival or avoiding bankruptcy, addressing time-inconsistency through equilibrium solutions.
Contribution
It develops a novel equilibrium approach for conditional optimal stopping, extending classical methods and analyzing uniqueness and non-uniqueness in finite and infinite horizons.
Findings
Equilibrium solutions are unique in finite horizon cases.
Infinite horizon problems exhibit non-uniqueness and complex phenomena.
Generalization of the Snell envelope for conditioned processes.
Abstract
Inspired by recent work of P.-L. Lions on conditional optimal control, we introduce a problem of optimal stopping under bounded rationality: the objective is the expected payoff at the time of stopping, conditioned on another event. For instance, an agent may care only about states where she is still alive at the time of stopping, or a company may condition on not being bankrupt. We observe that conditional optimization is time-inconsistent due to the dynamic change of the conditioning probability and develop an equilibrium approach in the spirit of R. H. Strotz' work for sophisticated agents in discrete time. Equilibria are found to be essentially unique in the case of a finite time horizon whereas an infinite horizon gives rise to non-uniqueness and other interesting phenomena. We also introduce a theory which generalizes the classical Snell envelope approach for optimal stopping by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Conditional Optimal Stopping:
A Time-Inconsistent Optimization
Marcel Nutz
Depts. of Statistics and Mathematics, Columbia University, [email protected]. Research supported by an Alfred P. Sloan Fellowship and NSF Grant DMS-1812661. MN is grateful to Pierre-Louis Lions and Abdoulaye Ndiaye for helpful discussions.
Yuchong Zhang Dept. of Statistical Sciences, University of Toronto, [email protected].
Abstract
Inspired by recent work of P.-L. Lions on conditional optimal control, we introduce a problem of optimal stopping under bounded rationality: the objective is the expected payoff at the time of stopping, conditioned on another event. For instance, an agent may care only about states where she is still alive at the time of stopping, or a company may condition on not being bankrupt. We observe that conditional optimization is time-inconsistent due to the dynamic change of the conditioning probability and develop an equilibrium approach in the spirit of R. H. Strotz’ work for sophisticated agents in discrete time. Equilibria are found to be essentially unique in the case of a finite time horizon whereas an infinite horizon gives rise to non-uniqueness and other interesting phenomena. We also introduce a theory which generalizes the classical Snell envelope approach for optimal stopping by considering a pair of processes with Snell-type properties.
Keywords Conditional Optimal Stopping; Time-inconsistency; Equilibrium
AMS 2010 Subject Classification 60G40; 93E20; 91A13; 91A15
1 Introduction
The classical optimal stopping problem is to maximize the expected payoff over all stopping times , where is a given adapted process. In this paper, we propose to study a criterion that conditions on a given stopping time not being reached at the time :
[TABLE]
When the model is based on a Markov chain , a natural choice of is the first exit time from a given set . If, for instance, the stopping decision is made by a company, one application is that being in indicates solvency so that is the time of bankruptcy. Indeed, the company may only care about states where the stopping payoff happens before as the company no longer exists in the other states. Or, for an individual making a financial decision, may be the time of death, then the model expresses that she only cares about states where the payoff happens while she is alive.
It is typically not possible to model such a conditional problem as a classical optimal stopping problem, except in the trivial case where the conditioning event does not depend on the stopping time . The classical framework would require us to model this as an exit time problem where a specific payoff is assigned to the exit event (that is, a value for ). E.g., for the individual facing possible death, we are unable to simply say, “I don’t care what happens after I die.” Instead, we have to assign a specific payoff at death. Even if the modeler were willing to fix some value in order to be “pragmatic,” it may be hard to make a justifiable choice and the solution of the optimization will typically depend on it.
This paper is inspired by recent work of P.-L. Lions which introduces the optimal control of conditioned processes [25]. There, the main example is controlling the drift of a Brownian motion and the payoff is conditioned on the process staying inside a given domain. The problem is cast as an optimal control problem of Fokker–Planck equations, a particular type of mean field game problem with coupling through the final condition. The limit towards the classical case, where the domain tends to , is given particular attention. While it is observed that optimal controls depend on the starting point, the question of time-consistency is not raised.
In the present paper, we introduce optimal stopping with conditioning, a novel problem to the best of our knowledge. One of our first observations is that the problem is time-inconsistent in the sense of Strotz [29]: if an agent determines an optimal strategy at time and reconsiders her decision at a later time taking into account her present state, she may contradict her previous decision and find that her strategy is no longer optimal. In this setting where the dynamic programming principle does not hold, there is more than one notion of optimization. The precommitted problem is to optimize the expected payoff at , assuming that the decision will not be challenged later on; i.e., the agent “commits” to the initial choice. (The theory of [25] corresponds to this notion.) In Strotz’ terminology, a sophisticated agent without a commitment device is aware of the fact that her “future selves” may overturn her current plan. Thus, she takes this as a constraint for a “strategy of consistent planning”: she chooses her behavior ignoring plans that she knows her future selves will not carry out; that is, she selects an action such that her future incarnations have no incentive to deviate. The resulting time-consistent strategy is called subgame perfect Nash equilibrium, and this is the notion that we will focus on. A different interpretation follows the literature on intergenerational models or overlapping-generations models (see [28] and the work thereafter) where future decisions are taken by subsequent generations rather than other selves. For instance, a government agency may want to take into account future presidential terms and opt for policies which will not be reversed after the next election.
Beyond being interesting in and of itself, conditional stopping may also help to shed more light on the conditioned control of processes, since optimal stopping is often more tractable than control.
1.1 Literature
Following the early work of [29], a rich literature involving time-inconsistency has emerged in economics. For instance, [27] reconsiders Strotz’ concept in a setting with non-exponential discounting when the number of decision points changes, and [26] studies preferences that change over time. Non-standard discounting (in particular hyperbolic) and time preferences (such as habit formation) are the most frequent reasons for time-inconsistency in this literature; see [14] for an overview. The models are mostly formulated in discrete time with finite or infinite time horizon. Time-inconsistency also arises when the optimization objective involves a nonlinear function of an expectation, such as the mean-variance criterion in [2], or a probability distortion as in [1, 15, 23]. (A probability distortion corresponds to an optimization objective that over- or underemphasizes events relative to their objective probability.)
The pioneering work of [10, 11] has initiated the study on how to define and obtain equilibrium strategies for the optimal control of continuous-time processes, using the example of Ramsey’s problem when the planner uses non-exponential discounting. In the continuous setting, varying a control at a single instance in time is meaningless since it does not affect the diffusion. The authors develop a first-order criterion which corresponds to variations of the control over a short time interval, meaning that agents can commit for a short period. This has led to a number of works, including portfolio optimization with non-exponential discounting [12, 13], mean-variance portfolio selection [5, 8] and general linear–quadratic control [16, 17]. Nevertheless, this concept of equilibrium is not the only one possible; in particular, first-order conditions are not sufficient for optimality in general. The recent study [21] introduces a stronger concept of optimality and highlights the differences. In [3, 4] the authors study time-inconsistent control in discrete and continuous time, respectively, and the relation between them, for a general class of objectives that are a sum of an expected utility and a nonlinear function of an expected utility with possible dependence on the initial condition. See also [31] for a continuous-time framework with dependence on the initial condition.
The closest reference for the present work is [22] where the authors study optimal stopping in discrete time under non-exponential discounting in a Markovian context. In the finite horizon case, a backward recursion yields the unique equilibrium. In the infinite horizon case, the authors focus on a time-homogeneous Markov chain. Under the assumption of decreasing impatience (including hyperbolic discounting), a time-homogeneous equilibrium is constructed by iterating the “strategic reasoning” or “fictitious play” map (cf. in Section 2.1); that is, every agent optimizes her decision between continuing and stopping while taking as given the decisions of all other agents. Remarkably, an equilibrium which is optimal for all agents can be obtained. We remark that [22] is predated by [18] where the iterative approach was first implemented in continuous time. In [18], time-homogeneous equilibria are obtained for time-homogeneous diffusions and inhomogeneous equilibria for time-inhomogeneous diffusions. See also [20] for a discussion of optimal equilibria in continuous time and [30] for a recent study of optimal stopping with non-exponential discounting where equilibria may not exist and this fact is related to a failure of smooth pasting. Optimal stopping under probability distortion is studied in [19] with a particular focus on equilibria that are obtained by iterating from naïve strategies.
The mentioned works on optimal stopping in continuous time use a direct analogy to the discrete-time case to define equilibria: each agent may stop or continue, without any commitment device. Indeed, for optimal stopping, the first-order approach of [10] is not a necessity: the decision to stop at a single instance in time immediately affects the process. On the other hand, as highlighted by [9] in the context of prospect theory, the definition in continuous time may include unreasonable equilibria based on the fact that continuation and stopping for a time- agent produce the same payoff if the subsequent agents stop and is continuous. In particular, “always stopping” is an equilibrium even if, say, is increasing. In a homogeneous diffusion model, [6, 7] use a first-order condition to define equilibria for two problems with time-inconsistency, and then “always stopping” is not necessarily an equilibrium. The relation between the two definitions has not been clarified so far.
To the best of our knowledge, the present paper is the first investigation of conditional optimal stopping. Regarding the control of conditioned processes, we would like to mention ongoing work of R. Carmona and M. Laurière where the problem of [25] is studied as a mean field control problem for open and closed loop controls as well as ongoing work of Y. Achdou and M. Laurière on the numerical resolution.
1.2 Synopsis
We study the conditional optimal stopping problem in (1.1) in a discrete-time setting with finite or infinite time horizon. While a continuous-time setting may certainly be of interest, our choice avoids some of the difficulties mentioned in the preceding section and leads to an uncontroversial definition of an equilibrium: at every time and state , an agent makes a binary choice—stopping or continuing—without committing future agents. We analyze such equilibria in a general stochastic framework while paying particular attention to the Markovian setting.
In the case of a finite time horizon , there is a natural terminal condition (stopping is mandatory at ) and we shall see that there is an equilibrium which can be constructed by a backward recursion. This recursion computes two processes, a value process like in the classical case and an additional “survival process” that keeps track of the conditioning probability induced by the future selves’ decisions. The equilibrium is essentially unique, and if the stochastic framework is Markovian, then so is the equilibrium. These findings are in line with the results for other time-inconsistent problem as described in Section 1.1.
In the case of an infinite horizon, we provide a fairly general existence result by passing to the limit of finite horizon problems. (Note that for non-exponential discounting, existence may fail if the discounting does not satisfy decreasing impatience; cf. [22, Example 3.1].) On the other hand, we also provide examples showing that this case is more subtle than the previous one. We shall see that there can exist non-Markovian equilibria in addition to Markovian ones in a Markovian setting, which disproves a conjecture of [4] for our problem. Moreover, equilibria need not be unique even within the class of Markovian equilibria. Even more surprisingly, we detail a time-homogeneous Markovian example which does not admit a time-homogeneous equilibrium while time-inhomogeneous equilibria do exist. This is in sharp contrast to the results of [18, 22] and illustrates that for our problem, in general, iterating the “strategic reasoning” map of [22] does not converge. At a technical level, one reason is that non-exponential discounting with decreasing impatience as in [22] preserves one inequality of the dynamic programming principle whereas in our problem, the rescaling due to the conditioning probability can cause deviations in both directions.
It seems natural to ask for analogues of the classical Snell envelope theory in our setting. Indeed, the two processes described in the recursion for the finite time horizon can be characterized in more abstract terms by supermartingale properties. This leads to a notion that we call Snell pair and extends to the infinite-horizon setting. Snell pairs are (essentially) in one-to-one relation with equilibria. Similarly as in the classical case, the equilibrium policy is retrieved from the Snell pair by stopping where the value process meets the obstacle , but the survival process is needed to adjust the classical supermartingale properties in the context of conditioning. The survival process, in turn, also enjoys a supermartingale property. We are not aware of similar notions in the prior literature.
The remainder of this paper is organized as follows. In Section 2 we detail the observation of time-inconsistency and the equilibrium concept. Section 3 presents the results on the finite-horizon case. Existence of equilibria in the infinite-horizon case is covered in Section 4 and the corresponding examples are described in Section 5. The concluding Section 6 discusses Snell pairs and their relation to equilibria.
2 Setting
Let be the time horizon. If , set ; if , set . We will work on a probability space equipped with a filtration such that is trivial. Let be a stopping time with ; we think of events that happen after as irrelevant and call the domain of relevance at time . In the case , it is convenient to set . We may note that ; indeed, specifying is equivalent to specifying a decreasing adapted sequence with . Here and in what follows, the convention is used. Finally, let be an adapted process describing the payoff for stopping at time . The value of outside will not matter; we set on for notational purposes, where is an auxiliary state with the convention that . We assume throughout that . Since we are interested in events that happen strictly before , including the case where never happens, it will be useful to introduce the notation
[TABLE]
We can then consider the precommitted optimal stopping problem at the initial time,
[TABLE]
Note that the supremum only runs over stopping times which avoid conditioning on a nullset and that the set of such times always includes .
Example 2.1** (Markovian Setting).**
Let be a Markov chain with values in a separable metric space starting at , let be a measurable subset containing and let be the first exit time from . Then, our model entails that we only evaluate states of the world where the trajectory of lies in up to the stopping time . A possible specification of the payoff is for a deterministic function and a discount factor . More generally, the set can be time-dependent.
The conditional optimal stopping problem (2.1) reduces to a classical optimal stopping problem when . But in general, the conditioning in the definition of the expected payoff for depends on itself, so that it cannot be reduced to a classical stopping problem.
2.1 Equilibria
The following example illustrates that the optimization problem (2.1) is time-inconsistent in the sense that an optimal stopping strategy for an agent today may not be optimal in the future; that is, if she reconsiders her strategy at a future time using a conditional criterion, she may contradict her previous decision.
Example 2.2**.**
Consider a two-period binomial tree with as illustrated in Figure 1, where stands for up and for down. The conditional probabilities are on every edge and the numbers at each node represent the payoff . The domain of relevance includes all states except ; i.e., the dashed line indicates the exit from the domain.
Since there are only five distinct stopping times in this model, once can easily compute all possible payoffs and observe that the unique optimizer of (2.1) is the stopping time with and . To wit, it is optimal to stop at if we have moved up in the first step and at otherwise. The obtained payoffs are illustrated by the solid dots and the associated value is .
Next, consider an analogous optimization problem for an agent who solves the problem conditionally on starting in the down state at . This agent has only two options, either to stop immediately with payoff 3 or to wait until the horizon and receive an expected reward of (since the expectation is conditioned on remaining inside the domain). Thus, this agent prefers to stop, and that is not consistent with . In summary, if the first agent solves (2.1) and reconsiders her own strategy at in the down state using the natural conditional criterion, she will overturn her previous decision.
For the remainder of the paper we focus on an uncommitted sophisticated agent in the sense of [29] (see [24] for a recent paper surveying other approaches). She thinks of her “future selves” at various times and states as other agents that will optimize their choices when subsequent decisions are considered as given. Thus, we look for a policy which future selves will not override. A policy is a collection of binary decisions (stop or continue), one for each time and state, and an equilibrium is a policy such that no agent is incentivized to deviate.
Before formalizing this, let us observe that each agent faces the constraint of not conditioning on a null event. That is, any agent is forced to stop if continuing would lead to exiting the domain with probability one in the next step. Thus, the problem has the (random) effective time horizon
[TABLE]
The following adapts the basic notions of [18, 22] to our problem of conditional stopping (instead of non-exponential discounting) and extends them to a non-Markovian setting.
Definition 2.3**.**
A stopping policy is a -valued adapted process . We interpret as the agent at choosing to stop and as continuing. We also introduce the continuation stopping time
[TABLE]
this is the stopping time induced by for a time- agent who decides to continue. A stopping policy is called admissible if
[TABLE]
We denote by the set of all admissible stopping policies.
Admissibility implies that every time- agent with has a well-defined continuation value
[TABLE]
Naturally, she compares with her stopping value and prefers the larger one, or she is invariant if they are equal. (Agents with are forced to stop, so there is no decision to be taken. The value of for is unimportant and set to only for specificity.) If we start with some and all agents simultaneously update their choice according to this preference while using the convention that invariant agents stick to their preexisting decision, we are led to the updated stopping policy
[TABLE]
Definition 2.4**.**
An admissible stopping policy is an equilibrium (stopping policy) if .
This notion corresponds to a subgame perfect Nash equilibrium: each agent is behaving optimally if the future agents’ choices are seen as given.
Example 2.5**.**
Consider the setting of Example 2.2. In any admissible stopping policy, the time- agents have to stop because of the time horizon. Both time- agents then prefer to stop as their stopping values (10 and 3) exceed the expected continuation values (3 and 2). Given those decisions, the expected continuation value for the time-[math] agent is which exceeds the stopping value of . It easily follows that the unique equilibrium stopping policy is given by , and . The induced stopping time for the time-[math] agent is . This differs from the precommitted-optimal stopping time of Example 2.2, and the associated expected reward of is smaller than the precommitted value function .
In a Markov chain setting, a natural subset of stopping policies is also of a Markovian form. Denoting by the -field generated by a random variable , this can be formalized as follows.
Definition 2.6**.**
Consider the Markovian setting of Example 2.1. A stopping policy is called Markovian if is -measurable for all .
If is admissible, this is equivalent to the existence of measurable subsets such that
[TABLE]
Note that such equilibria are actually path-dependent through , but this is the least amount of path-dependence compatible with our general definition of admissibility. In the Markovian setting, one could assume without loss of generality that all exit states (states outside ) are absorbing. Then, we have a.s. and one can require that is (a.s.) -measurable.
3 Finite-Horizon Equilibria
In this section we discuss existence, uniqueness and construction of equilibria for the case .
In the classical optimal stopping problem, the value function and the optimal decision of a time- agent are completely determined by the value functions of the agents at time . This fact lies at the heart of the backward recursion of dynamic programming and the Snell envelope theory. In the problem at hand, however, the conditioning event in the computation of the continuation value depends on the decisions of many future selves, not only the ones at time . This suggests introducing an additional process to keep track of the probability of the conditioning event given the stopping policy of all future selves; we call the survival process since it is related to survival probabilities. In Theorem 3.1 below we provide a backward recursion to construct an equilibrium; its recursive formula for resembles the classical case where it would be the conditional expectation of the value process at time , but now this expectation is calculated under a new measure obtained by using the normalized survival process as a density.
Just like in classical optimal stopping, one type of non-uniqueness arises when an agent is invariant; that is, when the stopping and continuation values happen to be equal: . Thus, an algorithm for the construction of an equilibrium necessarily comes with a specific choice. The theorem stated below uses early stopping preference, meaning that invariant agents choose to stop, and it yields the unique equilibrium with that preference. In the classical setting, this corresponds to the first time that the Snell envelope hits the obstacle. In general, a stopping preference is an adapted process with binary values, defining for each the choice in the case of invariance. For each such preference, one can write an algorithm similar to Theorem 3.1 and it delivers the unique equilibrium with that preference. Conversely, every finite-horizon equilibrium arises in that way.
Theorem 3.1**.**
Let and recall that on . Define the value process and the survival process as follows. Set and . For , set
[TABLE]
[TABLE]
Then is the unique equilibrium with preference for early stopping.
In Section 6 we will call a Snell pair and discuss its connection to Snell envelopes. A generalization including the infinite-horizon case will also be provided. We nevertheless opt to provide an elementary and self-contained treatment of the finite-horizon in the present section.
Proof of Theorem 3.1..
We show in Lemma 3.2 below that is admissible and that coincides with the continuation value of . Once that is established, the very definition of shows that
[TABLE]
and hence is an equilibrium stopping policy with early stopping preference. On the other hand, the boundary condition at and a backward induction allow us to see that there is at most one such equilibrium. ∎
Lemma 3.2**.**
In the setting of Theorem 3.1, is admissible and
[TABLE]
and for we have
[TABLE]
Proof.
We first check that is admissible. Indeed, we have for , and if , backward induction shows that .
Next, we prove the formula for . The last two cases are clear from the definition. Thus, we focus on showing on . For we have so nothing needs to be proved. For we argue by induction. Indeed, using the induction hypothesis to obtain below,
[TABLE]
where holds due to
[TABLE]
In the last identity, the first case holds since implies that and agree. The second case holds because entails that and on . Finally, on we have . This completes the proof for and we note that (3.2) was obtained as part of the first display above. It remains to show that
[TABLE]
Since the denominators are non-zero and agree by (3.2), it suffices to show
[TABLE]
Indeed, (3.3) is clear for since that implies . It is also clear for . For we argue by backward induction. We first observe that, by similar arguments as below (3.2),
[TABLE]
On the set occurring in the first case of (3.4) we have
[TABLE]
where the three equalities follow from the induction hypothesis, the definitions of and , and on , respectively. As a result, we can take conditional expectations in (3.4) and obtain that the identity holds everywhere. The tower property then yields the claim (3.3) and the proof is complete. ∎
Corollary 3.3**.**
In the Markovian setting of Example 2.1 with , there exists a unique equilibrium with preference for early stopping and that equilibrium is Markovian.
Proof.
We observe that and in Theorem 3.1 are -measurable for all , and then so is . ∎
One can note that the stopping preference is important in the above result: it is easy to construct examples of non-Markovian equilibria by specifying a path-dependent stopping preference and taking the reward function to be constant.
4 Infinite-Horizon Equilibria: Existence
The following result establishes the existence of infinite-horizon equilibria in a setting that includes Markov chains with a countable state space.
Theorem 4.1**.**
Suppose that is a.s. discrete111We call a -field discrete if it is generated by a countable partition of . In the case of a Markov chain with countable state space one can define as the -field generated by the sample paths up to time . for all and that a.s. Moreover, assume that
[TABLE]
Then an equilibrium exists.
Let us comment on the assumptions before stating the proof.
Remark 4.2**.**
(a) Condition (4.1) covers in particular problems with discounting for a payoff function with sub-exponential growth. Consider for instance the Markov chain setting of Example 2.1 with a bounded and nonnegative payoff function and a discount factor . Then setting for (and ), we see that (4.1) is satisfied for any .
(b) The proof of Theorem 4.1 below has three steps. The construction of a limiting stopping policy and the verification of its optimality condition do not require (4.1) at all. The latter is used to ensure that is admissible. There are many other situations where admissibility holds, including without discounting, that can be established on a case-by-case basis, for instance the case of a Markov chain with a finite state space and a homogeneous reward . Condition (4.1) is merely one way to write a simple and fairly general result. Of course, a.s. is always a sufficient condition for , for any stopping time .
(c) Similarly, there are many cases where one can see directly from additional structure of that a.s. for all . In that case, is irrelevant.
(d) On the other hand, existence is not guaranteed without some assumption. For instance, if inside the domain but (cf. Example 5.1 below with ), a strictly increasing reward leads to non-existence since stopping is undesirable for any agent but is not admissible.
Proof of Theorem 4.1..
For , let be the (countable) collection of atoms generating . Given , consider a modified problem with time horizon and let be the equilibrium stopping policy obtained by applying Theorem 3.1 with the payoff . We also set for . Note that each is a binary sequence . By a diagonal procedure we can thus find a subsequence (again denoted ) which converges to a stopping policy in the following sense: given and , we have for all sufficiently large . If and denote the effective horizons, then and thus the admissibility of for implies that for .
To complete the proof that is admissible and an equilibrium, we fix arbitrary and and check the admissibility and optimality conditions at that state. For simplicity of notation, we assume that and (the general case differs only by writing conditional expectations and probabilities). To further simplify the notation, we set and . The convergence of to implies that a.s. More precisely, this convergence is stationary on , yielding that a.s. Moreover, , where the union is disjoint, and similarly for . It follows that
[TABLE]
Admissibility. We must ensure that . In view of (4.2) it suffices to exhibit a reachable state where stopping happens for all large , as that will imply that . Indeed, by (4.1) we can find and with such that and
[TABLE]
and hence
[TABLE]
This shows that for the agent at , stopping is optimal no matter what future selves do. In particular, for all and thus on . As a result, .
Optimality. It suffices to show that the continuation values converge at the fixed initial state; i.e., . Once that is established, if , then for large and hence shows that is optimal, and similarly for . To see that
[TABLE]
note that the denominators are non-zero by admissibility and by (4.2). In view of a.s. we have a.s. on . As we have assumed that a.s., this convergence holds everywhere. Using also the standing assumption that and (4.2), the convergence of the numerators follows by dominated convergence. ∎
Corollary 4.3**.**
Consider the Markovian setting (Example 2.1) under the conditions of Theorem 4.1. Then there exists a Markovian equilibrium.
Proof.
We revisit the proof of Theorem 4.1. Each of the finite-horizon problems is Markovian, so Corollary 3.3 shows that is Markovian. Since was constructed as a pointwise limit of , it is again -measurable. ∎
We shall see in Example 5.3 that this corollary cannot be improved in a time-homogeneous setting: the equilibria may nevertheless be time-dependent.
5 Infinite-Horizon Equilibria: Examples
5.1 Non-Uniqueness and Non-Markovian Equilibria
The following example shows that in the infinite-horizon case, multiple equilibria may exist. In these equilibria, all agents’ choices are uniquely determined; i.e., the non-uniqueness is not merely due to different choices of agents that are invariant between stopping and continuing. Moreover, the multiplicity arises even within the class of time-homogeneous Markov equilibria. The example also shows that non-Markovian equilibria may exist in a Markovian setting.
Example 5.1**.**
Consider a homogeneous Markov chain on the states with initial value and transition probabilities in its natural filtration. Only the states in are relevant for the agents, meaning that and . The payoff is given by a function of the current state and a discount factor . Specifically,
[TABLE]
where is a constant satisfying
[TABLE]
We also assume that ; the other transition probabilities are arbitrary. Then, there are exactly two Markovian equilibria:
- (i)
stop everywhere; i.e., ; 2. (ii)
stop if the chain is at State 2 or has exited; i.e., .
If , there are further, non-Markovian equilibria. In these equilibria, the induced stopping time for a given agent at some state coincides with the stopping time induced by (i) or (ii), conditionally on .
Proof.
We first note that as and , the only optimal choice for a time- agent on is to stop, no matter what future agents choose.
(a) To see that is an equilibrium, consider an agent at State 1, without loss of generality at . Then
[TABLE]
showing that stopping is indeed optimal and is an equilibrium.
(b) The policy defined by is admissible. To see that it defines an equilibrium, consider again the time-[math] agent at State 1. Let be the first hitting time of state , so that and . We have a.s. since . As , the symmetry between and yields that and thus . Moreover,
[TABLE]
since . It follows that
[TABLE]
showing that continuation is optimal. Thus is an equilibrium.
(c) Let be a Markovian equilibrium; we show that must be one of the two above policies. We have already observed that any agent at State 2 must stop. The same holds for any agent at State 0, by admissibility. That is, for all . If no other agent stops, is the policy of (ii). Otherwise there exists a time- agent stopping at State 1: on . But then the same calculation as in (5.1) shows that any agent at time and State 1 must also stop, etc., so that for all . As a result, the set of all agents at State 1 that stop can be thought of as a half-line starting at . If this half-line is infinite, is the equilibrium from (i). If not, there is some maximal where the time- agent stops, meaning that for and for . But now the calculation in (5.2) shows that stopping is not optimal for any time- agent on , a contradiction.
(d) Next, we give an example of a non-Markovian equilibrium. Indeed, set . For , we define
[TABLE]
Simple calculations analogous to (5.1) and (5.2) show that is an equilibrium. If , both cases in the definition of happen with positive probability so that is indeed non-Markovian.
(e) Let be any equilibrium, possibly non-Markovian. The first argument from (c) still shows that for such that and , it follows that . However the second argument from (c) merely shows that for such that and , it follows that . (But this need not hold if , in contrast to the Markovian case where the policy cannot depend directly on ). This implies that given the past up to time , the stopping time induced by is either immediate stopping as in (i) or the first exit time of as in (ii). Note that, as in (d), the choice between these two may depend on . ∎
Remark 5.2**.**
(a) The finite-horizon version of Example 5.1 has a unique equilibrium, given by stopping everywhere. This follows by a backward recursion and the same calculation as in (5.1), since the time- agents have to stop. The limit of this equilibrium as is the infinite-horizon equilibrium (i). On the other hand, the equilibrium (ii) does not arise as a limit of finite-horizon equilibria.
(b) In this particular example the two Markovian equilibria are ordered: equilibrium (ii) has a larger value function for all agents. It is worth noting that the limit equilibrium is the inferior one.
(c) Example 5.3 shows that in general, no dominating equilibrium exists. One can also construct simple examples where the equilibrium value processes and stopping policies corresponding to different preferences are not ordered.
5.2 Non-Existence of Time-Homogeneous Equilibria
In this section we construct an example of a time-homogeneous Markov chain which admits Markovian equilibria but no time-homogeneous equilibria. In that sense, Theorem 4.1 and Corollary 4.3 cannot be improved, and a restriction to time-homogeneous notions is not possible (or will lead to non-existence). Importantly, the example also shows that the remarkable iterative approach of [22] does not apply in our setting. Indeed, in the problem of non-exponential discounting with decreasing impatience, an iterated application of (from a suitable starting point) produces a monotone sequence which converges to a time-homogeneous equilibrium. In our case however, the iteration can fail to be monotone. This can be related to a failure of both inequalities of the dynamic programming principle, whereas decreasing impatience preserves one.
Example 5.3**.**
Consider the homogeneous Markov chain on with transition probabilities as labeled next to the edges in Figure 2. In particular, States 0, 3 and 4 are absorbing. We set so that 0 is the only exit state. The payoff process is given by where is the discount factor and , , , as labeled in the boxes in Figure 2. To avoid trivialities, we assume that the initial position is one of the non-absorbing states, i.e., either or , and we also restrict our attention to equilibria that stop at State 3.222Since State 3 is absorbing and , all policies have zero reward for an agent at State 3 who is therefore invariant. This leads to an infinity of (uninteresting) equilibria. If early stopping preference is assumed, stopping at State 3 is a consequence rather than a condition. A Markovian equilibrium is called time-homogeneous if does not depend on .
We fix such that the following inequalities are satisfied:
[TABLE]
[TABLE]
[TABLE]
One possible choice is , , . Then, up to a.s. equivalence:
- (i)
All equilibria are Markovian. 2. (ii)
There exists no time-homogeneous equilibrium. 3. (iii)
There are exactly two equilibria and they are given by shifts of one another. Indeed, let
[TABLE]
where
[TABLE]
for
[TABLE]
Then , are Markovian equilibria, and exactly two of them are distinct up to a.s. equivalence: if , then and , whereas if , then and , a.s. 333Recall that the initial condition is deterministic in our basic setup. If , then State 1 can only be visited at odd and State 2 only at even ; the reverse is true if . This leads to the a.s. equivalence of two pairs of . Whereas if we treated the initial state as not being fixed (as may be considered natural in a Markovian framework) or if we assumed that has a distribution with support including both states, then all four equilibria would be distinct.
That all equilibria are Markovian is related to the filtration being relatively small (a.s.) due to various states being absorbing—this fact should not be given too much weight. The proofs for the other items are rather lengthy, so let us try to summarize the key mechanics heuristically. First, the dynamics are engineered such that in any equilibrium, the decision of a time- agent depends only on the agents at . Moreover, as highlighted in Lemma 5.4 below, it embeds two types of agents that cannot agree (and cannot even agree to disagree): Call Minniet the agent at State 2 and time and Donaldt the agent at State 1 and time . Minnie prefers to live in harmony and always wants to agree, whereas Donald is only happy if he contradicts Minnie. Suppose that at some time , Donaldt says “1” (stop). Then Minniet-1 also opts for 1, but the combative Donaldt-2 immediately replies with 0, thus implying time-inhomogeneity as he is contradicting Donaldt. The situation is similar if Donaldt starts with [math].
Conversely, there are exactly two equilibria because the above backward recursion also implies a unique forward recursion once the initial Donald0 (or Minnie0, depending on what the initial state is) fixes one of the two possible choices 0 or 1.
Proof of (i)–(iii)..
Let us first observe that any equilibrium stopping policy (possibly non-Markovian) must stop on , by admissibility. Furthermore, it must stop on : State 4 is absorbing and , so that continuing is never optimal due to the discount factor . Since we have also convened that stops on , we may henceforth restrict our attention to equilibria satisfying on for all .
(i) Let be any equilibrium; we show that is Markovian (or rather, a.s. equivalent to a Markovian equilibrium). Indeed, suppose first that the initial condition is , and fix . We have that on . But since are absorbing states, . Suppose that is odd. Then is a nullset, so that up to a.s. equivalence, only the value of on has not been determined yet. But due to the absorption on and the fact that exactly one of the sets and has positive probability for every , we have which implies that is an atom in . In particular, is a.s. constant on , and since a.s. on , it follows that is of Markovian form. The situation is analogous if is even, and hence is Markovian. The initial condition is is dealt with similarly.
The proof of (ii) and (iii) necessitates the following lemma which describes the Minnie–Donald relationship sketched above.
Lemma 5.4**.**
Let satisfy (5.3)–(5.5) and let be an admissible stopping policy such that on for . Then for all ,
- (P1)
if on , then on ;
- (P2)
if on , then on ;
- (P3)
if on , then on ;
- (P4)
if on , then on .
The proof of the lemma is reported after the proof of (ii) and (iii).
(ii) Define the 4-periodic sequence by where are as in (iii) above. Note that exhaust all combinations of and the remaining states. Thus, a time-homogeneous equilibrium must (a.s.) be of the form , , for some . On the other hand, for any , (P1)–(P4) imply that , thus ruling out the existence of a time-homogeneous equilibrium. (We iterate twice to ensure that the policies differ also modulo a.s. equivalence).
(iii) Admissibility of () holds since a.s. (due to being absorbing) and since from any non-absorbing state there is a positive probability of reaching before reaching . Moreover, follows by direct verification using (P1)–(P4). Hence, are equilibria.
To see that there are exactly two equilibria, suppose first that the initial condition is and let be a (necessarily Markovian) equilibrium. Modulo a.s. equivalence, is completely determined by its values on , , , etc., since State 1 can only be visited at even times and State 2 only at odd times. Next, we use (P1)–(P4): Suppose that on . This implies on , which implies on , which implies on , etc. Therefore, we have a.s.
Alternately, on . This implies on , thus on , thus on , etc. In particular, we have a.s.
The case of the initial condition is similar.444If the initial state is not considered fixed or if it is random with and , then (P1)–(P4) imply that is a.s. equal to exactly one of the four , uniquely determined by the values of on and .
∎
Proof of Lemma 5.4..
Let and set
[TABLE]
so that is the continuation value at time . Note that comparing with is equivalent to comparing with . We first show (P1) and (P3).
(P1): Suppose that . This implies and thus the assumption of (P1) yields that , and . By the second part of (5.3), we have and thus as claimed.
(P3): If , then and the assumption of (P3) imply , and . By the first part of (5.3), we have and thus .
Next, we analyze (P2) and (P4). Denote by and the numerator and denominator of . It is clear that for all , since is the maximum possible payoff. By iterated conditioning, we have that on the set ,
[TABLE]
where we have used that if and if . Similarly, we deduce that on ,
[TABLE]
and on ,
[TABLE]
Equations (5.6)-(5.9) yield the following bounds: on ,
[TABLE]
and on ,
[TABLE]
(P2): Suppose that on . Throughout the proof of (P2), we assume that we are on the set ; i.e., all statements are conditional on . Then, and . To establish that , it suffices to show that . To that end, we derive a lower bound for and an upper bound for (conditionally on ). Let
[TABLE]
and note that . Starting from the fact that on by (5.14), we use (5.11) to see that on ,
[TABLE]
and then (5.14) and to deduce that on ,
[TABLE]
where
[TABLE]
By (5.8), the assumption of (P2), (5.6), and iterated conditioning, we have
[TABLE]
Substituting the lower bound (5.16) for into the above equation,
[TABLE]
Similarly, using (5.9), the assumption of (P2), (5.7), iterated conditioning and (5.15), we obtain that
[TABLE]
These two bounds yield that
[TABLE]
As a consequence, a sufficient condition for is that
[TABLE]
Since is linear in , this is equivalent to and , which is precisely (5.4).
(P4): Suppose that on , so that and . We assume throughout the proof of (P4) that we are on the set , and we shall establish that by showing the inequality .
Proceeding similarly as in the proof of (P2), we start from the fact that and and apply (5.10), (5.13), (5.12) and (5.15) repeatedly to derive the following bounds on :
[TABLE]
Then, we use (5.6), (5.7), the assumption of (P4) and (5.5) to deduce that
[TABLE]
The proof is complete. ∎
Remark 5.5**.**
The results in Example 5.3 extend to the undiscounted case if we focus on equilibria that stop in the absorbing State 4 (or focus on equilibria with early stopping preference). The situation is the same as for State 3: without discounting, any agent at State 4 is invariant between stopping and continuing which leads to an infinity of equilibria.
6 Snell Pairs and Equilibria
In this section we provide a theory which extends both the Snell envelope of classical optimal stopping and the recursion from the finite-horizon case in Theorem 3.1. As mentioned in Section 3, the value process (which is the Snell envelope of in the classical case) needs to be complemented with the survival process to provide a sufficient statistic for an agent’s optimality criterion. We introduce the Snell pair pragmatically in Definition 6.1 by stating the properties that will be used most often in the proofs. Alternately, both processes can be described through a more elegant Snell envelope property (Lemma 6.3), whence the terminology. The main result of this section will be a correspondence between Snell pairs and equilibria; see Theorem 6.5 and its corollary.
We focus on equilibria with early stopping preference throughout this section. Other preferences could be accommodated but lead to (even) heavier notation. For the infinite-horizon case , we assume throughout that
[TABLE]
We also recall that if , so that will be used when the horizon is included in the index set.
Definition 6.1**.**
A pair consisting of adapted processes and is said to be a Snell pair (with early stopping preference) if the following hold:
- (i)
on and on for all , and for all .555The property that for is in fact redundant with (iii). 2. (ii)
Given , is the smallest adapted process which dominates and renders a supermartingale.666We follow the usual convention that supermartingale properties, Snell envelopes, etc., are understood on unless explicitly mentioned; that is, is not included. 3. (iii)
Given , is the smallest nonnegative supermartingale on satisfying on for all as well as if . 4. (iv)
For all , the process is a supermartingale, where
[TABLE]
Some comments on the definition are in order before we connect Snell pairs with equilibrium stopping policies.
Lemma 6.2**.**
Properties (i)–(iii) imply the following “martingale properties away from the obstacle,”
- (v)
if and , then and .
Proof.
If the first identity fails for some , replacing by yields a smaller supermartingale with the required properties, contradicting (iii). If the second identity fails, replacing by yields a smaller process with the required properties, contradicting (ii). ∎
Lemma 6.3**.**
Properties (i)–(iii) are jointly equivalent to the following:
- (i’)
* on for all and for all .* 2. (ii’)
* is the Snell envelope of .* 3. (iii’)
* is the Snell envelope of on .*
Proof.
Clearly (i) implies (i’). To see the reverse, suppose that on . Then , is a nonnegative supermartingale. Thus, (iii’) yields that and (i) follows. Given (i’), the equivalence of (ii) and (ii’) is immediate. For the equivalence of (iii) and (iii’), note that is a supermartingale whenever is a supermartingale. ∎
Lemma 6.4**.**
(a) The processes and occurring in (ii’) and (iv) are uniformly integrable.
(b) Let and let be a Snell pair. Then
[TABLE]
[TABLE]
Proof.
(a) Recall that and that the Snell envelope of any process with an -majorant is uniformly integrable. In view of of (ii’), it follows that is uniformly integrable, and then so is .
(b) We have from (i) and (iii) that is a bounded supermartingale with . In particular, . Passing to the limit, martingale convergence yields that . Conversely, (i) clearly implies that and that on . Hence, (6.2) is proved.
Part (a), (ii’) and the classical limit property of the Snell envelope yield that . Moreover, using (6.2) and (6.1),
[TABLE]
and thus (6.3) follows. ∎
We can now state the main result which relates Snell pairs to equilibria, thus extending the classical Snell envelope theory to conditional optimal stopping.
Theorem 6.5**.**
(a) Let be a Snell pair. Then, defines an equilibrium stopping policy with early stopping preference. Moreover,
[TABLE]
*and if .
(b) If is an equilibrium stopping policy with early stopping preference, then there exists a unique Snell pair such that . This Snell pair is given by (6.4)–(6.5).
As mentioned above, Snell pairs reduce to the usual Snell envelope in the classical case.
Corollary 6.6**.**
Suppose that for all .
(i) Any equilibrium corresponds to optimal stopping in the classical sense: for .
(ii) Any Snell pair consists of and the classical Snell envelope .
Proof.
Note that . Let be an equilibrium and the associated Snell pair. Then
[TABLE]
We have by (6.4)–(6.5). Since is a supermartingale dominated by , we must have for all . It follows that is the Snell envelope of ; that is, . ∎
In the finite horizon-case, Snell pairs correspond to the processes constructed in Section 3.
Corollary 6.7**.**
Let . Then there exists a unique Snell pair and it is determined by the backward recursion of Theorem 3.1.
Proof.
Let and be as in Theorem 3.1. By Theorem 6.5, there exists a unique Snell pair with , and it is completely determined by (6.4)–(6.5). In view of Lemma 3.2 and the definition in Theorem 3.1, also satisfies (6.4)–(6.5), thus . ∎
We note that in the infinite-horizon case, the examples in Section 5 show that Snell pairs are not unique in general.
Proof of Theorem 6.5..
We focus on the case ; the finite-horizon case is similar but simpler.
(a) Let be a Snell pair and ; we show that and . If , then (i) implies and hence and ; see (iii). Let . Note that we are in and . If , we have , whereas if , we have . In summary,
[TABLE]
As a consequence, recalling Lemma 6.4 for the case ,
[TABLE]
Next, we consider separately two cases.
Case : Using (v), the Optional Sampling Theorem (with the boundedness of and the uniform integrability from Lemma 6.4) as well as (6.6) and (6.7), we see that
[TABLE]
and
[TABLE]
In view of (i), Equation (6.8) yields in particular that since we are in , as required for the admissibility of . Moreover, as , (6.8) and (6.9) together imply that
[TABLE]
Case : In this case, is a martingale from time to time and hence, similarly to the previous case,
[TABLE]
Taking conditional expectations on both sides, we deduce that
[TABLE]
In view of and (i), we have which then implies and finishes the proof of admissibility. Moreover, by the supermartingale property of , the Optional Sampling Theorem with the uniform integrability from Lemma 6.4, and (6.7), we have
[TABLE]
As and , we conclude that .
Putting the two cases together and noting , we conclude that (6.4) and (6.5) hold. We also recall that the condition on was already established in (6.2). Finally, (6.4) shows that
[TABLE]
and the proof of (a) is complete.
(b) Let be an equilibrium stopping policy with early stopping preference; we show that the pair defined by (6.4) and (6.5) is a Snell pair.
First, we check that (6.5) implies . Indeed, we have . Thus, (6.5) implies as desired.
We readily see that (i’) holds, so it suffices to show (ii’), (iii’) and (iv). Note that
[TABLE]
Let (which implies that we are in ), then
[TABLE]
and
[TABLE]
This shows that , and are supermartingales on . In particular, (iv) holds. In fact, is a supermartingale up to : for any finite , we have and and consequently
[TABLE]
As is bounded and , the supermartingale property up to follows.
Next, let be the Snell envelope of . On the one hand, since is the smallest supermartingale dominating . On the other hand, let and define as the stopping time induced by at , then the stopping representation of the Snell envelope yields
[TABLE]
Thus, we have shown and (iii’) is proved.
Similarly, let be the Snell envelope of . We have since is the smallest supermartingale dominating . Let . On the set , we trivially have by the definition of . Whereas on the set ,
[TABLE]
where we have used the Dominated Convergence Theorem, (6.5), (6.10) and the definitions of , and . We conclude that ; that is, (ii’) holds.
It remains to observe the uniqueness. Indeed, if is another Snell pair such that , then satisfies (6.4) and (6.5) by (a). But (6.4) and (6.5) uniquely define the two processes, so we must have . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Barberis. A model of casino gambling. Manag. Sci. , 58(1):35–51, 2012.
- 2[2] S. Basak and G. Chabakauri. Dynamic mean-variance asset allocation. Rev. Financ. Stud. , 23(8):2970–3016, 2010.
- 3[3] T. Björk, M. Khapko, and A. Murgoci. On time-inconsistent stochastic control in continuous time. Finance Stoch. , 21(2):331–360, 2017.
- 4[4] T. Björk and A. Murgoci. A theory of Markovian time-inconsistent stochastic control in discrete time. Finance Stoch. , 18(3):545–592, 2014.
- 5[5] T. Björk, A. Murgoci, and X. Y. Zhou. Mean-variance portfolio optimization with state-dependent risk aversion. Math. Finance , 24(1):1–24, 2014.
- 6[6] S. Christensen and K. Lindensjö. On finding equilibrium stopping times for time-inconsistent Markovian problems. SIAM J. Control Optim. , 56(6):4228–4255, 2018.
- 7[7] S. Christensen and K. Lindensjö. On time-inconsistent stopping problems and mixed strategy stopping times. To appear in Stochastic Process. Appl. , 2018.
- 8[8] C. Czichowsky. Time-consistent mean-variance portfolio selection in discrete and continuous time. Finance Stoch. , 17(2):227–271, 2013.
