Risk aware minimum principle for optimal control of stochastic   differential equations

Jukka Isoh\"at\"al\"a; William B. Haskell

arXiv:1812.09179·math.OC·October 22, 2019·IEEE Trans. Autom. Control.

Risk aware minimum principle for optimal control of stochastic differential equations

Jukka Isoh\"at\"al\"a, William B. Haskell

PDF

TL;DR

This paper develops a risk aware optimal control framework for stochastic differential equations, extending the classical stochastic Pontryagin's minimum principle by incorporating a risk adjustment process based on a nonlinear risk function.

Contribution

It introduces a risk aware minimum principle that generalizes the risk neutral case with a new risk adjustment process, applicable to control problems with risk preferences.

Findings

01

The risk aware minimum principle involves an additional stochastic process as a risk adjustment.

02

Necessary and sufficient conditions for optimality are established for controls on probability measures.

03

Application to portfolio allocation demonstrates how risk awareness introduces a risk premium term.

Abstract

We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller's risk preferences. We state and prove a risk aware minimum principle that is a parsimonious generalization of the well-known risk neutral, stochastic Pontryagin's minimum principle. As our main results we give necessary and also sufficient conditions for optimality of control processes taking values on probability measures defined on a given action space. We show that remarkably, going from the risk neutral to the risk aware case, the minimum principle is simply modified by the introduction of one additional real-valued stochastic process that acts as a risk…

Equations603

x_{t} = ξ + \int_{0}^{t} b (s, x_{s}, a_{s}) d s + \int_{0}^{t} σ (s, x_{s}, a_{s}) d w_{s},

x_{t} = ξ + \int_{0}^{t} b (s, x_{s}, a_{s}) d s + \int_{0}^{t} σ (s, x_{s}, a_{s}) d w_{s},

a = (a_{t})_{t \in T} in f

a = (a_{t})_{t \in T} in f

a = (a_{t})_{t \in T} in f

a = (a_{t})_{t \in T} in f

\displaystyle\left\|x\right\|_{\mathcal{S}_{\mathcal{F}}^{p}}\coloneqq\mathbb{E}\biggl{[}\sup_{t\in\mathbb{T}}|x_{t}|^{p}\biggr{]}^{1/p}<\infty\quad\forall x\in\mathcal{S}_{\mathcal{F}}^{p}(\Omega;\mathbb{V}).

\displaystyle\left\|x\right\|_{\mathcal{S}_{\mathcal{F}}^{p}}\coloneqq\mathbb{E}\biggl{[}\sup_{t\in\mathbb{T}}|x_{t}|^{p}\biggr{]}^{1/p}<\infty\quad\forall x\in\mathcal{S}_{\mathcal{F}}^{p}(\Omega;\mathbb{V}).

\displaystyle\left\|z\right\|_{\mathcal{H}_{\mathcal{F}}^{p}}\coloneqq\mathbb{E}\biggl{[}\biggl{(}\int_{0}^{T}|z_{t}|^{2}\,\mathrm{d}t\biggr{)}^{p/2}\biggr{]}^{1/p}<\infty\quad\forall z\in\mathcal{H}_{\mathcal{F}}^{p}(\Omega;\mathbb{V}).

\displaystyle\left\|z\right\|_{\mathcal{H}_{\mathcal{F}}^{p}}\coloneqq\mathbb{E}\biggl{[}\biggl{(}\int_{0}^{T}|z_{t}|^{2}\,\mathrm{d}t\biggr{)}^{p/2}\biggr{]}^{1/p}<\infty\quad\forall z\in\mathcal{H}_{\mathcal{F}}^{p}(\Omega;\mathbb{V}).

x_{t} = x_{0} + \int_{0}^{t} \int_{A} b (s, x_{s}, a) π_{s} (d a) d s + \int_{0}^{t} \int_{A} σ (s, x_{s}, a) π_{s} (d a) d w_{s} \forall t \in T .

x_{t} = x_{0} + \int_{0}^{t} \int_{A} b (s, x_{s}, a) π_{s} (d a) d s + \int_{0}^{t} \int_{A} σ (s, x_{s}, a) π_{s} (d a) d w_{s} \forall t \in T .

\displaystyle\begin{gathered}\mathbb{E}\biggl{[}\sup_{t\in\mathbb{T}}\int_{\mathbb{A}}|a|^{r}\pi_{t}(\mathrm{d}a)\biggr{]}<\infty,\quad r<\infty,\\ \text{or}\,\text{$\mathop{\mathrm{supp}}\pi_{t}$ is compact}\quad\forall t\in\mathbb{T},\quad r=\infty.\end{gathered}

\displaystyle\begin{gathered}\mathbb{E}\biggl{[}\sup_{t\in\mathbb{T}}\int_{\mathbb{A}}|a|^{r}\pi_{t}(\mathrm{d}a)\biggr{]}<\infty,\quad r<\infty,\\ \text{or}\,\text{$\mathop{\mathrm{supp}}\pi_{t}$ is compact}\quad\forall t\in\mathbb{T},\quad r=\infty.\end{gathered}

v_{t}^{f} : = f (x_{t}^{π}) - f (x_{0}^{π})

v_{t}^{f} : = f (x_{t}^{π}) - f (x_{0}^{π})

\displaystyle\qquad+\frac{1}{2}\mathop{\mathrm{Tr}}\left[\nabla^{\top}\nabla f(x_{s})\left(\int_{\mathbb{A}}\sigma(s,x_{s}^{\pi},a)\pi_{s}(\mathrm{d}a)\right)\left(\int_{\mathbb{A}}\sigma(s,x_{s}^{\pi},a)\pi_{s}(\mathrm{d}a)\right)^{\top}\right]\biggr{\}}\,\mathrm{d}s\qquad\forall t\in\mathbb{T}

m_{t}^{f} : = f (ξ_{t}^{η}) - f (ξ_{0}^{η})

m_{t}^{f} : = f (ξ_{t}^{η}) - f (ξ_{0}^{η})

\displaystyle\qquad+\frac{1}{2}\mathop{\mathrm{Tr}}\left[\nabla^{\top}\nabla f(\xi_{s}^{\eta})\sigma(s,\xi_{s}^{\eta},a)\sigma(s,\xi_{s}^{\eta},a)^{\top}\right]\biggr{\}}\eta_{s}(\mathrm{d}a)\,\mathrm{d}s

\displaystyle\sup_{t\in\mathbb{T}}\mathbb{E}\biggl{[}\bigl{|}x_{t}^{\epsilon}-x_{t}\bigr{|}^{2}\biggr{]}

\displaystyle\sup_{t\in\mathbb{T}}\mathbb{E}\biggl{[}\bigl{|}x_{t}^{\epsilon}-x_{t}\bigr{|}^{2}\biggr{]}

C^{π} : = \int_{0}^{T} \int_{A} c (t, x_{t}^{π}, a) π_{t} (d a) d t + g (x_{T}^{π}),

C^{π} : = \int_{0}^{T} \int_{A} c (t, x_{t}^{π}, a) π_{t} (d a) d t + g (x_{T}^{π}),

∣ b (t, x, a) ∣ \leq L (1 + ∣ x ∣^{\overset{p}{ˉ}_{1}} + ∣ a ∣^{\overset{p}{ˉ}_{2}}),

∣ b (t, x, a) ∣ \leq L (1 + ∣ x ∣^{\overset{p}{ˉ}_{1}} + ∣ a ∣^{\overset{p}{ˉ}_{2}}),

∣ σ (t, x, a) ∣ \leq L (1 + ∣ x ∣^{\overset{p}{ˉ}_{1}} + ∣ a ∣^{\overset{p}{ˉ}_{2}});

∣ c (t, x, a) ∣ \leq L (1 + ∣ x ∣^{p_{1}} + ∣ a ∣^{p_{2}}),

∣ c (t, x, a) ∣ \leq L (1 + ∣ x ∣^{p_{1}} + ∣ a ∣^{p_{2}}),

∣ g (x) ∣ \leq L (1 + ∣ x ∣^{p_{1}});

∣ \nabla_{X} c (t, x, a) ∣ \leq L (1 + ∣ x ∣^{p_{1}^{'}} + ∣ a ∣^{p_{2}^{'}}),

∣ \nabla_{X} c (t, x, a) ∣ \leq L (1 + ∣ x ∣^{p_{1}^{'}} + ∣ a ∣^{p_{2}^{'}}),

∣ \nabla_{X} g (x) ∣ \leq L (1 + ∣ x ∣^{p_{1}^{'}});

p < \overset{p}{ˉ} \leq \overset{p}{ˉ}_{3},

p < \overset{p}{ˉ} \leq \overset{p}{ˉ}_{3},

\overset{p}{ˉ}_{2} \leq \frac{p ˉ _{3}}{p ˉ},

p_{1}^{'} \leq p_{1}, p_{2}^{'} \leq p_{2},

p_{1}, p_{2} < \frac{p ˉ}{p} - 1.

x^{π} \in S_{F}^{\overset{p}{ˉ}} (Ω; X), C^{π} \in L^{p} (Ω; R),

x^{π} \in S_{F}^{\overset{p}{ˉ}} (Ω; X), C^{π} \in L^{p} (Ω; R),

P_{0} : π \in V^{1} (b, σ, ν) in f E_{π} [C^{π}],

P_{0} : π \in V^{1} (b, σ, ν) in f E_{π} [C^{π}],

P_{1} : π \in V^{p} (b, σ, ν) in f ρ (C^{π}) .

P_{1} : π \in V^{p} (b, σ, ν) in f ρ (C^{π}) .

\displaystyle\mathrm{D}_{Y}F(X)\coloneqq\frac{\mathrm{d}}{\mathrm{d}\epsilon}F(X+\epsilon Y)\biggl{|}_{\epsilon=0}=\lim_{\epsilon\to 0}\frac{F(X+\epsilon Y)-F(X)}{\epsilon}

\displaystyle\mathrm{D}_{Y}F(X)\coloneqq\frac{\mathrm{d}}{\mathrm{d}\epsilon}F(X+\epsilon Y)\biggl{|}_{\epsilon=0}=\lim_{\epsilon\to 0}\frac{F(X+\epsilon Y)-F(X)}{\epsilon}

F (X + Y) - F (X) - ⟨ D F (X), Y ⟩ \in o (∥ Y ∥_{V}),

F (X + Y) - F (X) - ⟨ D F (X), Y ⟩ \in o (∥ Y ∥_{V}),

ϕ (μ) = ϕ (μ_{0}) + E [D ϕ (μ_{0}) (U_{0}) (U - U_{0})] + o (∥ U - U_{0} ∥_{p}),

ϕ (μ) = ϕ (μ_{0}) + E [D ϕ (μ_{0}) (U_{0}) (U - U_{0})] + o (∥ U - U_{0} ∥_{p}),

ϕ (μ^{'}) - ϕ (μ) - E [D ϕ (μ) (U) (U^{'} - U)] \geq 0 \forall μ, μ^{'} \in P^{p} (R^{n})

ϕ (μ^{'}) - ϕ (μ) - E [D ϕ (μ) (U) (U^{'} - U)] \geq 0 \forall μ, μ^{'} \in P^{p} (R^{n})

H (t, x, y, y^{'}, z, a)

H (t, x, y, y^{'}, z, a)

\forall (t, x, y, y^{'}, z, a) \in T \times X \times Y \times Y^{'} \times Z \times A .

(x, π) \to \int_{A} H (t, x, y, y^{'}, z, a) π (d a)

(x, π) \to \int_{A} H (t, x, y, y^{'}, z, a) π (d a)

d y_{t}^{π} = - \nabla_{X} H (t, x_{t}^{π}, y_{t}^{π}, y_{t}^{' π}, z_{t}^{π}, π_{t}) d t + z_{t}^{π} \cdot d w_{t},

d y_{t}^{π} = - \nabla_{X} H (t, x_{t}^{π}, y_{t}^{π}, y_{t}^{' π}, z_{t}^{π}, π_{t}) d t + z_{t}^{π} \cdot d w_{t},

y_{T}^{π} = y_{T}^{' π} \nabla_{X} g (x_{T}^{π}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Risk aware minimum principle for optimal control of stochastic differential

equations

Jukka Isohätälä and William B. Haskell

Abstract

We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller’s risk preferences. We state and prove a risk aware minimum principle that is a parsimonious generalization of the well-known risk neutral, stochastic Pontryagin’s minimum principle. As our main results we give necessary and also sufficient conditions for optimality of control processes taking values on probability measures defined on a given action space. We show that remarkably, going from the risk neutral to the risk aware case, the minimum principle is simply modified by the introduction of one additional real-valued stochastic process that acts as a risk adjustment factor for given cost rate and terminal cost functions. This adjustment process is explicitly given as the expectation, conditional on the filtration at the given time, of an appropriately defined functional derivative of the risk function evaluated at the random total cost. For our results we rely on the Fréchet differentiability of the risk function, and for completeness, we prove under mild assumptions the existence of Fréchet derivatives of some common risk functions. We give a simple application of the results for a portfolio allocation problem and show that the risk awareness of the objective function gives rise to a risk premium term that is characterized by the risk adjustment process described above. This suggests uses of our results in e.g. pricing of risk modeled by generic risk functions in financial applications.

1 Introduction

We consider the problem of optimal control of stochastic differential equations of the form

[TABLE]

over a finite time horizon, $t\in[0,T]\eqqcolon\mathbb{T}$ , $0<T<\infty$ , and where $\xi$ is a random initial value, $x=(x_{t})_{t\in\mathbb{T}}$ and $a=(a_{t})_{t\in\mathbb{T}}$ are the state and control processes, respectively, taking values on spaces $\mathbb{X}\coloneqq\mathbb{R}^{d_{x}}$ and $\mathbb{A}\subset\mathbb{R}^{d_{a}}$ , $d_{x},d_{a}\in\mathbb{N}\coloneqq\{1,2,\ldots\}$ . The process $w=(w_{t})_{t\in\mathbb{T}}$ is a standard $d_{w}$ -dimensional Brownian motion, $d_{w}\in\mathbb{N}$ , and $b$ and $\sigma$ are deterministic functions $b:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}^{d_{x}}$ , $\sigma:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}^{d_{x}\times d_{w}}$ .

Our focus here is on the problem of risk aware control of the diffusion process. The conventional optimal control theory of stochastic processes considers risk neutral problems, understood here as the minimization of expected costs accrued over the solution time interval,

[TABLE]

where $c:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}$ is a cost rate function, and $g:\mathbb{X}\to\mathbb{R}$ is a terminal cost function. In the risk aware control problems we consider here, the expectation in the objective is supplanted by a risk function $\rho$ that describes controller’s preferences that are not sufficiently modeled by the expected value. Formally, the risk aware problem is stated as

[TABLE]

where we suppose that the risk function $\rho$ is some generic mapping from random variables, representing total costs, to real values quantifying the magnitude of the risk associated with a given random variable. Convex or coherent risk measures form an important subset of the functions $\rho$ that our results attempt to cover [5, 31, 32].

In the discrete time case, dynamic programming formulations of the risk aware problem have proved elusive. This is intuitively unsurprising, as the construction of the Bellman equation hinges on the linearity of the expectation. Naturally this issue persists also in the continuous time context. The continuous time setting, however, affords an alternative to dynamic programming in the form of probabilistic formulations of the control problem111To be clear, there appears to be a non-zero, though very small number of works relating to probabilistic methods for discrete time stochastic optimal control; the only example that the present authors are aware of is [15].. Whereas in the dynamic programming world the control problem is stated in terms of partial differential equations [51, 52, 30], probabilistic formulations characterize the optimal controls in terms of solutions to stochastic differential equations [73]. In this work, we specifically focus on the stochastic Pontryagin’s minimum principle and its generalization to the risk aware case222Throughout we use the term minimum principle, since we phrase our control problem as the minimization of costs. The term maximum principle, commonly used in the literature, should be seen as an essentially synonymous term that is more appropriate when the problem is stated as a maximization of rewards..

The risk neutral stochastic minimum principle, simply stated, asserts that an optimal control minimizes, almost surely and at almost every point in time, an appropriately defined Hamiltonian function that in turn depends on adjoint processes satisfying a backward stochastic differential equation. These necessary conditions for optimality derive from variational equations describing the response of the cost functional at the optimal control to an infinitesimal change in control. This local nature of the minimum principle also provides a heuristic, a priori justification for preferring it over dynamic programming in risk aware problems: Bellman’s principle of optimality underlying the dynamic programming method is a statement about the structure of the objective of the control problem that relies on the linearity and the tower property of the expectation. The minimum principle on the other hand relates the optimal controls to the local behavior of state space trajectories and the cost functions. As such, the minimum principle does not impose requirements, here linearity, on the structure of the risk function in the same way as dynamic programming does. Instead, central to deriving a risk aware minimum principle is being able to evaluate the response of a risk function to changes in its input random variables.

Literature review

The stochastic minimum principle has a long history. Its early derivations can be found in the works [47, 13, 14, 11], with the modern version often being attributed to [59]. These results have spawned numerous refinements. Here we mention extensions to probability measure valued controls given in [56, 9, 8], as this type of a control framework used in this paper. Generalizations of the minimum principle to optimal control of continuous time partially observed processes, a topic closely related to risk aware optimization, have also been constructed [70, 6, 3]. The minimum principle has proven to be a viable alternative to dynamic programming e.g. when the controls might not be Markov, in the sense that they cannot be expressed as functions of the state variables at any given point in time. This is the case for McKean-Vlasov problems, for which probabilistic methods appear particularly well adapted [18, 19, 20]. The minimum principle is extensively covered in [73], with numerous additional references. Dynamic programming and the minimum principle are considered in parallel in [75], and a comprehensive review of the two methods can be found in [73, Chapter 5]. For applications of the minimum principle, and backward stochastic differential equations, we refer the reader to [27, 64].

Much of the recent work on the topic of control under uncertainty, broadly understood as random variability not accounted for by an expectation under full observations, has been done using dynamic risk measures [1] or nonlinear expectations such as Peng’s $g$ -expectation [60, 61] and its generalization, the $G$ -framework [62, 63]. Compared to static risk functions, these approaches impose additional structure, most notably time-consistency that allows for the use of e.g. the dynamic programming principle. While it is well-known that $g$ -expectations give rise to convex risk functions, the converse is generally true only for risk functions that are time-consistent [66]. In our approach, we consider objectives that are given in terms of static, law invariant risk measures, and in particular we do not impose time-consistency on the risk function. Moreover, since the risk function is not expressed as a $g$ -expectation, we do not need to consider forward-backward stochastic differential equation as the starting point, as was done in e.g. [57] where a minimum principle was derived for stochastic differential equations driven by Lévy processes. Optimality conditions using a similar variational approach were given for forward-backward differential equations in [72]. Dynamic risk measures were used in [10], where specifically the problem of optimal derivatives design was considered. A dynamic programming formulation for the $G$ -framework has been developed in [40, 39].

Finally, we note that in addition to the probabilistic and dynamic programming approaches, convex analytic and linear programming techniques form a third, loose set of methods for both risk neutral and risk aware control. For the risk neutral case, we refer the reader to [69, 68] for early development and [12, 45, 46] for refinements. A risk aware version has been developed in [42], where a state space augmentation scheme, inherited from earlier discrete time results [37], was used to construct a formulation of the risk aware problem. While this approach leads to a tractable computational method for solving the control problem, it does not provide a useful characterization of the optimal control in the way that the minimum principle does.

Contributions and organization of the paper

The contributions of this paper can be summed up as follows: (i) We generalize the control problem informally stated in Eqs. (1.1, 1.2) to feature measure valued control processes. Albeit the control model and the notion of a solution we utilize has been considered by some authors under the name of relaxed controls, we opt for a new term of vague controls. We justify the nomenclature by demonstrating key differences between relaxed and vague controls, and further show why the latter notion of a solution can be particularly useful. (ii) We introduce law invariant risk functions into a framework that allows a natural notion of functional differentiability that can subsequently be applied in deriving variational conditions for optimality of controls. (iii) Using these results, we formulate and prove a risk aware generalization of the stochastic Pontryagin’s minimum principle, and in doing so, we give a characterization of the optimal control of a risk aware problem. We find that in comparison to the risk neutral problem, the minimum principle is modified by a risk adjustment process that is related to the functional derivative of the risk function, evaluated at the terminal cost. Finally, (iv), we demonstrate by means of solving a simple example that in financial applications, risk awareness creates non-trivial but intuitive risk pricing effects.

In the next section, we will describe the notations used in the paper, and state the control problem we consider. Section 3 describes the risk functions that model the risk aware objectives. We outline some necessary differentiability properties of the functions that will subsequently be needed for the probabilistic formulation of the problem that is given in the following Section 4. This section derives necessary and sufficient conditions for the optimality of a control process. We present an application of the theory in Section 5 where we characterize the optimal controls of a simple portfolio allocation problem. Section 6 concludes with discussion and some remarks. Technical proofs are deferred to Appendix A.

2 Model

Throughout the paper we will use the following notations and definitions: For any probability space $(\Omega,\Sigma,\mathbb{P})$ , a Banach space $(\mathbb{V},|\cdot|)$ and $p\geq 0$ , we denote $\mathcal{L}^{p}(\Omega,\Sigma,\mathbb{P};\mathbb{V})$ , or $\mathcal{L}^{p}(\Omega;\mathbb{V})$ for short, as the set of random variables $q:\Omega\to\mathbb{V}$ such that $\mathbb{E}_{\mathbb{P}}[|q|{}^{p}]<\infty$ , where $\mathbb{E}_{\mathbb{P}}$ stands for the expectation with respect to the measure $\mathbb{P}$ . If $\mathbb{P}$ is clear from the context, we simply use the symbol $\mathbb{E}$ . In addition, $\mathcal{L}^{\infty}(\Omega;\mathbb{V})$ denotes the space of $\mathbb{P}$ -essentially bounded random variables. We shall use $\left\|\cdot\right\|_{p}$ to denote the norm on $\mathcal{L}^{p}(\Omega;\mathbb{V})$ , $p\in[1,\infty]$ . For a real Banach space $\mathbb{V}$ , we use $\mathbb{V}^{\ast}$ to denote its continuous dual, and $\langle\cdot,\cdot\rangle:\mathbb{V}^{\ast}\times\mathbb{V}\to\mathbb{R}$ for the duality pairing.

Borel probability measures on a topological space $\mathbb{V}$ are denoted by $\mathcal{P}(\mathbb{V})$ , and the Borel $\sigma$ -algebra on $\mathbb{V}$ is denoted by $\mathscr{B}(\mathbb{V})$ . The Dirac measure centered at $x\in\mathbb{V}$ is denoted by $\delta_{x}$ . By $\mathcal{P}^{p}(\mathbb{V})$ , $p\in[1,\infty)$ , we mean probability measures $\mu\in\mathcal{P}(\mathbb{V})$ such that $\int d(v,v_{0}){}^{p}\mu(\mathrm{d}v)<\infty$ for all $v_{0}\in\mathbb{V}$ ; $\mathcal{P}^{\infty}(\mathbb{V})$ denotes probability measures with bounded support. The law or distribution of a random variable $V\in\mathcal{L}^{p}(\Omega;\mathbb{V})$ , $p\in[1,\infty]$ , is denoted by $\mathscr{L}_{\mathbb{P}}(V)$ , that is, $\mathscr{L}_{\mathbb{P}}(V)(\Gamma)\coloneqq\mathbb{P}\circ V^{-1}(\Gamma)$ for all $\Gamma\in\mathscr{B}(\mathbb{R})$ ; if the probability measure is clear from the context, we use the symbol $\mathscr{L}$ instead. The extended reals will be denoted $\mathbb{R}_{\infty}\coloneqq\mathbb{R}\cup\{\infty\}$ and elements of $\mathbb{R}^{n}$ , $n\in\mathbb{N}$ , are by default interpreted as column vectors, i.e. $\mathbb{R}^{n}\coloneqq\mathbb{R}^{n\times 1}$ .

For a given filtered probability space $(\Omega,\Sigma,\mathcal{F},\mathbb{P})$ , a normed space $(\mathbb{V},|\cdot|)$ and a $p\in[1,\infty)$ , we shall use $\mathcal{S}_{\mathcal{F}}^{p}(\Omega,\Sigma,\mathbb{P};\mathbb{V})$ or $\mathcal{S}_{\mathcal{F}}^{p}(\Omega;\mathbb{V})$ for short to denote $\mathbb{V}$ -valued $\mathcal{F}$ -predictable continuous processes on $\mathbb{T}$ such that

[TABLE]

In addition, $\mathcal{H}_{\mathcal{F}}^{p}(\Omega,\Sigma,\mathbb{P};\mathbb{V})=\mathcal{H}_{\mathcal{F}}^{p}(\Omega;\mathbb{V})$ denotes the space of $\mathcal{F}$ -predictable processes on $(0,T)$ such that

[TABLE]

Two processes $z,z^{\prime}\in\mathcal{H}_{\mathcal{F}}^{p}(\Omega;\mathbb{V})$ are considered equivalent if $\left\|z-z^{\prime}\right\|_{\mathcal{H}_{\mathcal{F}}^{p}}=0$ . Finally, we set $\mathcal{S}_{\mathcal{F}}^{\infty}(\Omega;\mathbb{V})\coloneqq\cap_{p\in[1,\infty)}\mathcal{S}_{\mathcal{F}}^{p}(\Omega;\mathbb{V})$ and $\mathcal{H}_{\mathcal{F}}^{\infty}(\Omega;\mathbb{V})\coloneqq\cap_{p\in[1,\infty)}\mathcal{H}_{\mathcal{F}}^{p}(\Omega;\mathbb{V})$ .

Continuous functions from a topological space $\mathbb{V}$ to a normed space $\mathbb{U}$ are denoted $\mathcal{C}(\mathbb{V},\mathbb{U})$ , and we equip this space with the usual supremum norm. If $\mathbb{U}=\mathbb{R}$ , we abbreviate this by $\mathcal{C}(\mathbb{V})$ . The subspaces of bounded and compactly supported functions are denoted $\mathcal{C}_{b}(\mathbb{V})$ and $\mathcal{C}_{c}(\mathbb{V})$ , respectively. Superscripted function spaces $\mathcal{C}^{(k)}(\mathbb{R}^{n})$ , $\mathcal{C}_{b}^{(k)}(\mathbb{R}^{n})$ , etc., $n\in\mathbb{N}$ , denote spaces of $k\in\mathbb{N}$ times continuously differentiable functions with derivatives respectively in $\mathcal{C}$ , $\mathcal{C}_{b}$ , etc. For every differentiable function $f:\mathbb{R}^{n}\to\mathbb{R}^{k}$ , $n,k\in\mathbb{N}$ , the Jacobian of $f$ is denoted $\nabla f$ , so that $\nabla f\in\mathbb{R}^{n}\to\mathbb{R}^{k\times n}$ and $(\nabla f(x))_{ij}\coloneqq\partial f_{i}(x)/\partial x_{j}$ for all $i\in\{1,\ldots,k\}$ , $j\in\{1,\ldots,n\}$ ; in particular, the gradient of a real-valued function is a row vector. For multivariate functions, we use $\nabla_{\mathbb{U}}$ to indicate that the derivative is taken with respect to the argument taking values in the space $\mathbb{U}$ . For convenience, for all $A\in\mathbb{R}^{n\times m}$ and $B\in\mathbb{R}^{n\times\ell}$ , $n,m,\ell\in\mathbb{N}$ , we denote $A\cdot B\coloneqq(A^{\top}B)^{\top}=B^{\top}A\in\mathbb{R}^{\ell\times n}$ , where $(\cdot)^{\top}$ stands for the transpose.

We generalize Eq. (1.1) to feature measure valued controls. Instead of an adapted stochastic process $(a_{t})_{t\in\mathbb{T}}$ taking values on an action space $\mathbb{A}\subset\mathbb{R}^{d_{a}}$ , the controls shall here in general be probability measure valued processes $(\pi_{t})_{t\in\mathbb{T}}$ , $\pi_{t}\in\mathcal{P}(\mathbb{A})$ . We introduce the notion of vague controls (throughout $\mathbb{X}=\mathbb{R}^{d_{x}}$ and $\mathbb{A}\subset\mathbb{R}^{d_{a}}$ shall be our given state and action spaces, however, we will also consider solutions on extended state spaces, and hence the definitions below should be understood to hold for any analogously defined finite dimensional state and action spaces).

Definition 2.1.

(Vague controlled solution) Let $\mathbb{X}\coloneqq\mathbb{R}^{d_{x}}$ , $\mathbb{W}\coloneqq\mathbb{R}^{d_{w}}$ , and $\mathbb{A}\subset\mathbb{R}^{d_{a}}$ , and let $b:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{X}$ , $\sigma:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{X}\times\mathbb{W}=\mathbb{R}^{d_{x}\times d_{w}}$ be given drift and diffusion functions that are continuous on $\mathbb{T}\times\mathbb{X}$ and measurable on $\mathbb{A}$ . A* vague controlled solution to the problem $(b,\sigma,\nu)$ * comprises a filtered probability space $(\Omega,\Sigma,\mathcal{F}=(\mathcal{F}_{t})_{t\in\mathbb{T}},\mathbb{P})$ and a process $(x_{t},w_{t},\pi_{t})_{t\in\mathbb{T}}$ , $x_{t}\in\mathbb{X}$ , $w_{t}\in\mathbb{W}$ , $\pi_{t}\in\mathcal{P}(\mathbb{A})$ for all $t\in\mathbb{T}$ , such that: (i) the filtration $\mathcal{F}$ is complete and right-continuous, (ii) $(x_{t})_{t\in\mathbb{T}}$ is $\mathcal{F}$ -adapted with continuous sample paths, $(w_{t})_{t\in\mathbb{T}}$ is an $\mathcal{F}$ -Brownian motion, and $(\pi_{t})_{t\in\mathbb{T}}$ is $\mathcal{F}$ -progressively measurable, (iii) the distribution of $x_{0}$ is $\nu$ , and (iv) the processes satisfy, $\mathbb{P}$ -almost surely,

[TABLE]

Moreover, we call the solution $r$ -admissible, $r\in[1,\infty]$ , if we additionally have that (v)

[TABLE]

We shall use $\mathfrak{V}(b,\sigma,\nu)$ to denote vague controlled solutions of the problem $(b,\sigma,\nu)$ .

Definition 2.2.

A strict controlled solution is a vague controlled solution $\pi\in\mathfrak{V}(b,\sigma,\nu)$ such that $\pi_{t}$ is a Dirac measure for all $t\in\mathbb{T}$ .

For brevity, we write $\pi\in\mathfrak{V}(b,\sigma,\nu)$ to refer to a vague controlled solution, but it is important to bear in mind that the solutions are in fact $(\Omega,\Sigma,\mathcal{F},\mathbb{P},(x_{t})_{t\in\mathbb{T}},(w_{t})_{t\in\mathbb{T}},\pi=(\pi_{t})_{t\in\mathbb{T}})$ -tuples. If necessary, we label the state process by the control, i.e. write $(x_{t}^{\pi})_{t\in\mathbb{T}}$ . Clearly, strict controlled solutions can be identified with controlled solutions where the control process takes values on $\mathbb{A}$ rather than $\mathcal{P}(\mathbb{A})$ .

Vague controlled solutions are in the current literature frequently referred to as relaxed controlled solutions, however, these two concepts differ in some key aspects. In fact, up to the knowledge of the authors, vague controlled solutions have never been called anything else but relaxed controls, and the differences between the definitions are not always explicitly noted. Examples of works where vague controls are used include [53, 8, 4, 3]. We note, as [4], that vague controlled stochastic differential equations can be related to controlled stochastic processes driven by non-orthogonal martingale measures, whereas the more canonical relaxed controlled model can be identified with equations driven by orthogonal martingale measures [24]. See e.g. [71] for more discussion on martingale measures. It also bears pointing out that the topology conventionally assigned to relaxed controls, see e.g. [29], may be too coarse for vague controlled problems to guarantee the continuity of the mapping from controls to the stochastic trajectories, which has implications for e.g. applying the chattering lemma [26, Theorem 2.2] to vague controls. Indeed, as Example 2.3 below demonstrates, it may not always be possible to find strict controls and associated solutions of Eq. (2.1) that approximate a given vague controlled solution.

To elucidate the difference between these notions of solutions, consider $\pi\in\mathfrak{V}(b,\sigma,\nu)$ , where $b,\sigma$ are for simplicity taken to be bounded. Applying Itô’s lemma to $f(x_{t}^{\pi})$ , $f\in\mathcal{C}_{c}^{(2)}(\mathbb{X})$ , we have that the process $(v_{t}^{f})_{t\in\mathbb{T}}$ , defined

[TABLE]

is a martingale for any $f\in\mathcal{C}_{c}^{(2)}(\mathbb{X})$ . A relaxed controlled solution corresponding to the drift and diffusion functions $b,\sigma$ is conventionally defined as a filtered probability space together with a stochastic process $(\xi_{t}^{\eta},\eta_{t})_{t\in\mathbb{T}}$ , $\xi_{t}^{\eta}\in\mathbb{X}$ , $\eta_{t}\in\mathcal{P}(\mathbb{A})$ for all $t\in\mathbb{T}$ , satisfying items (i-iii) of Definition 2.1, but characterized by the condition that the processes $(m_{t}^{f})_{t\in\mathbb{T}}$ ,

[TABLE]

are martingales for all $f\in\mathcal{C}_{c}^{(2)}(\mathbb{X})$ . Comparing Eqs. (2.6) and (2.5) we see that the order of integration against the control and squaring the diffusion coefficient are interchanged. Thus, informally, a vague controlled solution corresponds to processes where for any $t\in\mathbb{T}$ , the amplitude of the noise is the $\pi_{t}$ -average of $a\to\sigma(t,x_{t}^{\pi},a)$ , whereas in the relaxed controlled case, the noise is the $\eta_{t}$ -root mean square of the diffusion function.

It is well-known that a strict optimal control may fail to exist while one can always be found within the set of relaxed controls. This is due to the convexity of the space of probability measures, a property that has in the past often been exploited in optimal control of stochastic differential equations [25, 17, 38]. An additional motivation for considering generalizations of strict controls comes from the fact that little is known about the nature of the optimal control in the risk aware case. In discrete time, risk aware formulations featuring generic risk functions in the objective have been successfully described333These works used a somewhat inelegant state space augmentation scheme that can here be avoided; see also Remark 4.3. using the convex analytic formulation [37], later expanded to the continuous-time case as well [42]. Such problems often, though certainly not exclusively, feature relaxed controls as the optimal solution [36, 74, 23] and it is therefore not unreasonable to expect that a generalization of strict controls may be appropriate here as well.

We note that relaxed controlled solutions are more widely represented in the literature than vague controlled solutions. This is in part due to the fact that relaxed controlled solutions can be viewed as the closure of strict controls, under a suitably defined topology [25]. This is not the case for vague controls. The following example demonstrates that there are vague controlled solutions whose finite dimensional distributions cannot be approximated by those of strict or relaxed controls. A similar example has been featured earlier in [7].

Example 2.3.

Consider $\mathbb{X}=\mathbb{R}$ , $\mathbb{A}=\{-1,+1\}$ and $b(t,x,a)=0$ , $\sigma(t,x,a)=a$ for all $(t,x,a)\in\mathbb{T}\times\mathbb{X}\times\mathbb{A}$ , and $\nu=\delta_{0}$ . Then for all strict controls $\pi\in\mathfrak{V}(b,\sigma,\nu)$ (in fact, for all relaxed controls as well), $x^{\pi}$ is an $\mathcal{F}$ -Brownian motion, but there exists a vague controlled solution $\pi^{\prime}\in\mathfrak{V}(b,\sigma,\nu)$ such that $\pi_{t}^{\prime}=(\delta_{-1}+\delta_{+1})/2$ and $x_{t}^{\pi^{\prime}}=0$ for all $t\in\mathbb{T}$ . Consequently, considering e.g. a control problem of $\inf_{\pi}\mathbb{E}[(x_{T}^{\pi})^{2}]$ , it is clear that a vague controlled solution may attain a strictly lower optimum value that can be found using strict controls.

Our main reason for considering vague controls is that the optimality conditions obtained from a stochastic minimum principle are considerably simpler and thus easier to use in practice. In the classical risk neutral case, and when the control set $\mathbb{A}$ is non-convex and the diffusion coefficient depends on the control, first and second order adjoint equations are needed to characterize the optimal control, see e.g. the classic work by Peng [59] and more recent results for relaxed controls in [9, 49]. The issues resulting from the need for second order expansions are exacerbated in the risk aware setting, where the second order expansions will also require us to compute second order functional derivatives of the risk function $\rho$ . For vague controls, first order expansions turn out to be sufficient, which is also the case in risk neutral problems considered in [8, 3]. We note that the sufficiency of first order expansions is not entirely surprising, since this is also the case for strict controls when the control set $\mathbb{A}$ is convex, see e.g. [13]. We demonstrate this below in Example 2.4. Indeed, vague controls could be viewed as measure valued strict controls, in which case the control set $\mathcal{P}(\mathbb{A})$ is naturally convex. Finally, as strict controls are a subset of vague controls, optimality of a strict control is readily shown by demonstrating that a vague control process necessarily takes values on Dirac measures.

Example 2.4.

Consider the problem of [73, Example 4.1]. We set $\mathbb{X}=\mathbb{R}$ , $\mathbb{A}=\{0,1\}$ and $b(t,x,a)=0$ , $\sigma(t,x,a)=a$ for all $(t,x,a)\in\mathbb{T}\times\mathbb{X}\times\mathbb{A}$ , $\nu=\delta_{0}$ , and consider minimizing $\mathbb{E}[x_{T}^{2}]$ over strict controls. Clearly, $(x_{t}=0,\pi_{t}=\delta_{0})_{t\in\mathbb{T}}$ is optimal. In the context of strict controls, one considers spike perturbations (cf. Eq. (4.3) of [73]) to an optimal control to establish conditions for optimality. In [73, Example 4.1], it is shown that if $x^{\epsilon}=(x_{t}^{\epsilon})_{t\in\mathbb{T}}$ is a state space process corresponding to such a perturbation, and $x=(x_{t})_{t\in\mathbb{T}}$ is optimal (here, zero), then $\sup_{t\in\mathbb{T}}\mathbb{E}[|x_{t}^{\epsilon}-x_{t}|^{2}]=\epsilon/2$ . However, if we consider a vague control formed by a convex combination of the optimal control $\pi=(\pi_{t}=\delta_{0})_{t\in\mathbb{T}}$ and an arbitrary progressively measurable $q=(q_{t})_{t\in\mathbb{T}}$ so that $\pi_{t}^{\epsilon}=(1-\epsilon)\pi_{t}+\epsilon q_{t}$ for all $t\in\mathbb{T}$ , we find that

[TABLE]

Therefore, perturbations to vague controls result in an $\mathcal{O}(\epsilon^{2})$ response in the state space paths (in the above sense), whereas for strict controls, we only have $\mathcal{O}(\epsilon)$ . This suggests that computing the first order response may indeed be sufficient for establishing necessary conditions for optimality of vague controlled solutions.

The natural downside to considering vague controls is, as Example 2.3 demonstrates, that the optimal vague control may be something that cannot be approximated by strict controls. This may be an issue in practice, if a vague control cannot realistically be implemented. This is in contrast to the case of usual relaxed controls, for which it is typically possible to construct an $\epsilon$ -optimal strict control from an optimal relaxed control.

We assume standard continuity and boundedness conditions that guarantee the existence of solutions to Eq. (2.1), given a probability space, a Brownian motion, and a control process. We also state conditions for the cost rate function, and hence, we first need to introduce the total cost random variable. For every $\pi\in\mathfrak{V}(b,\sigma,\nu)$ we define the total cost $C^{\pi}$ as

[TABLE]

where $c:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}$ is the cost rate function and $g:\mathbb{X}\to\mathbb{R}$ is the terminal cost. In the risk neutral case, it generally suffices to ensure that $C^{\pi}\in\mathcal{L}^{1}(\Omega;\mathbb{R})$ , however here, we shall need to compute the risk of $C^{\pi}$ which generally involves evaluating an $\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ functional at $C^{\pi}$ , where $p\in[1,\infty)$ . In order to accommodate a wider range of possible values of $p$ , somewhat more elaborate conditions (compared to the risk neutral case) on the bounds of $b,\sigma,c,g$ and their growth rates shall be needed. In addition, as the optimality conditions given in Section 4 will be derived from variational inequalities, we require that the relevant functions are all also differentiable. Formally, our baseline assumptions are as follows.

Assumption 2.5.

The initial distribution $\nu\in\mathcal{P}(\mathbb{X})$ , drift and diffusion functions $b:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{X}$ , $\sigma:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}^{d_{x}\times d_{w}}$ , cost rate and terminal cost functions $c:\mathbb{T}\times\mathbb{X}\times\mathbb{A}\to\mathbb{R}$ , $g:\mathbb{X}\to\mathbb{R}$ , and admissible control processes are such that there are constants $L>0$ , $\bar{p}_{1}\in[0,1]$ , $\bar{p}_{2}\in[0,\infty)$ , $\bar{p}_{3}\in[0,\infty]$ , $\bar{p}\in[1,\infty)$ , $p_{1}\in[0,\infty)$ , $p_{2}\in[0,\infty)$ , $p_{1}^{\prime}\in[0,\infty)$ , $p_{2}^{\prime}\in[0,\infty)$ satisfying: (i) if $\bar{p}_{3}=\infty$ , then $\mathbb{A}$ is compact; (ii) for all $(t,x,a)\in\mathbb{T}\times\mathbb{X}\times\mathbb{A}$ ,

[TABLE]

(iii) for all $(t,a)\in\mathbb{T}\times\mathbb{A}$ , the functions $x\to b(t,x,a)$ and $x\to\sigma(t,x,a)$ are continuously differentiable, and the derivatives are bounded by $L$ ; (iv) for all $(t,x,a)\in\mathbb{T}\times\mathbb{X}\times\mathbb{A}$ ,

[TABLE]

(v) for all $(t,a)\in\mathbb{T}\times\mathbb{A}$ , the functions $x\to c(t,x,a)$ and $x\to g(x)$ are continuously differentiable, and satisfy, for all $(t,a)\in\mathbb{T}\times\mathbb{A}$ ,

[TABLE]

(vi) the initial distribution $\nu\in\mathcal{P}^{\bar{p}}(\mathbb{X})$ ; (vii) all control processes are $\bar{p}_{3}$ -admissible, i.e. satisfy Eq. (2.4) for $r=\bar{p}_{3}$ .

Definition 2.6.

Let $p\in[1,\infty)$ . We say that a vague controlled solution $\pi\in\mathfrak{V}(b,\sigma,\nu)$ is $p$ -feasible and denote $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , if there exists $\bar{p}$ , $\bar{p}_{i}$ , $i\in\{1,2,3\}$ , $p_{i}$ , $p_{i}^{\prime}$ , $i\in\{1,2\}$ satisfying Assumption 2.5 and the following inequalities:

[TABLE]

To give an intuition on the meanings and uses of these constants (Proposition 2.7 below gives a more formal statement), $p$ and $\bar{p}$ shall respectively represent the order up to which the costs $C^{\pi}$ and the state space variables $x_{t}^{\pi}$ , $t\in\mathbb{T}$ , are integrable. We allow for unbounded cost rates and terminal costs, in fact even superlinear growth is admissible ( $p_{1},p_{2}>1$ in Eq. (2.9)), but in order to guarantee that costs are in $\mathcal{L}^{p}(\Omega;\mathbb{R})$ , bounds on the integrability of the state and action variables need to be imposed. Eqs. (2.11) amount to sufficient conditions for such integrability to hold.

We expect a typical use-case of our results is such where one is given drift and diffusion functions $b$ and $\sigma$ , a cost structure in the form of the cost rate and terminal cost functions $c$ and $g$ , and a risk function as a map $\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ with a fixed $p\in[1,\infty)$ , mapping from costs to risks. The growth rates of these functions dictate the values of $\bar{p}_{1}$ , $\bar{p}_{2}$ , $p_{1}$ , $p_{2}$ , $p_{1}^{\prime}$ , and $p_{2}^{\prime}$ . The feasibility conditions of Eqs. (2.11) can then be understood as determining admissible $\bar{p}$ and $\bar{p}_{3}$ , representing the level of randomness in the initial condition and the range of control values that yield $\mathcal{L}^{p}(\Omega;\mathbb{R})$ -finite costs.

Given a filtered probability space with a Brownian motion and a progressively measurable $\mathcal{P}(\mathbb{A})$ -valued control process, stochastic differential equations satisfying the Assumptions 2.5 and inequalities of Eqs. (2.11) for a given $p\in[1,\infty)$ have strong solutions. Together, these comprise a $p$ -feasible vague controlled solution, and moreover, if $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , then the costs $C^{\pi}\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ .

Proposition 2.7.

Let $p\in[1,\infty)$ and suppose Assumptions 2.5 and Eq. (2.11) hold. Let $(\Omega,\Sigma,\mathcal{F}=(\mathcal{F}_{t})_{t\in\mathbb{T}},\mathbb{P})$ be a filtered probability space, $\mathcal{F}$ a complete and right continuous filtration, and $(w_{t})_{t\in\mathbb{T}}$ a $d_{w}$ -dimensional $\mathcal{F}$ -Brownian motion. (i) If $\pi=(\pi_{t})_{t\in\mathbb{T}}$ is an $\mathcal{F}$ -progressively measurable $\mathcal{P}(\mathbb{A})$ -valued stochastic process that satisfies Eq. (2.4) for $\bar{p}_{3}$ , then there exists a pathwise unique solution $x^{\pi}=(x_{t}^{\pi})_{t\in\mathbb{T}}$ to the stochastic differential equation (2.1) such that

[TABLE]

so that $(\Omega,\Sigma,\mathcal{F},\mathbb{P},\pi,x^{\pi})$ is in $\mathfrak{V}^{p}(b,\sigma,\nu)$ or in other words is a $p$ -feasible vague controlled solution. (ii) If additionally $a\to b(t,x,a)$ and $a\to\sigma(t,x,a)$ are $L$ -Lipschitz for all $(t,x)\in\mathbb{T}\times\mathbb{X}$ , then the mapping $\Pi_{\mathcal{F}}^{\bar{p}_{3}}(\Omega;\mathbb{A})\ni\pi\to x^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}(\Omega;\mathbb{X})$ is continuous.

In order to state the risk aware control problem, we need to first establish some basic properties of risk functions. We collate our discussions on their properties in the next section, where we first describe the subset of risk functions that can be used to evaluate the risk associated with $C^{\pi}$ when $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ .

3 Risk functions

Risk aware objective function

Given the definition of feasible vague controlled solutions, $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , $p\in[1,\infty)$ , and the cost functional $C^{\pi}$ , the risk neutral control problem could now be simply stated as

[TABLE]

where we have written the expectation as $\mathbb{E}_{\pi}$ to highlight the fact that the probability space, and in particular the probability measure used to compute the expectation, is a part of the vague controlled solution $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ . This problem statement does not trivially generalize to the risk aware case: Here, we presume we are given a risk function $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty)$ , defined on some unspecified probability space $(\Omega,\Sigma,\mathbb{P})$ , mapping an $\mathcal{L}^{p}(\Omega;\mathbb{R})$ random variable to a real-valued measure of risk that quantifies the variability associated with this random variable. Since in general the probability space for a given $\rho$ is fixed, we cannot use $\rho$ to evaluate the risk of $C^{\pi}$ when the probability space potentially varies with each $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu$ ).

To remedy this issue, note first that the risk neutral problem, Problem $\mathscr{P}_{0}$ , makes sense since the expectation does not depend on the particulars of the underlying probability space, but rather only on the distributions of the random variables. This is to say, for any two $\mathcal{L}^{1}$ -random variables $X$ and $\tilde{X}$ , defined on different probability spaces $(\Omega,\Sigma,\mathbb{P})$ and $(\tilde{\Omega},\tilde{\Sigma},\tilde{\mathbb{P}})$ , we have that $\mathbb{E}_{\mathbb{P}}[X]=\mathbb{E}_{\tilde{\mathbb{P}}}[\tilde{X}]$ whenever the laws of $X$ and $\tilde{X}$ agree. In order to generalize Problem $\mathscr{P}_{0}$ to the risk aware case, we restrict ourselves to risk functions having this same, law invariance property:

Definition 3.1.

Let $(\Omega,\Sigma,\mathbb{P})$ be a probability space. A mapping $\phi\text{ : }\mathcal{L}^{p}(\Omega;\mathbb{R})\rightarrow\mathbb{R}$ , $p\in[1,\infty]$ , is *law invariant *if there is a $\psi:\mathcal{P}^{p}(\mathbb{R})\to\mathbb{R}$ such that $\phi(U)=\psi(\mathscr{L}(U))$ for all $U\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ .

Law invariant risk functions have been extensively studied in the literature, in particular they admit well-known and widely exploited representation theorems [48, 33, 43]. Here however, the law invariance property allows us to state the risk aware version of Problem $\mathscr{P}_{0}$ . For any law invariant $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty]$ , we define Problem $\mathscr{P}_{1}$ as

[TABLE]

We adopt the view that a law invariant risk function can be equivalently seen as a mapping from $\mathcal{L}^{p}(\Omega;\mathbb{R})$ -random variables to reals, or as a function from $\mathcal{P}^{p}(\mathbb{R})$ -measures to reals. This latter representation of risk functions has been used also in previous works, see e.g. [37, 41, 42]. We emphasize that here, we consider the expression of risk for random variables and measures on equal footing: While viewing $\rho$ exclusively as a function from measures to reals is appealing in its simplicity, in doing so we would firstly lose some convexity and coherence properties that are better defined for $\mathcal{L}^{p}(\Omega;\mathbb{R})$ -functionals (see e.g. Definition 3.3 below). Indeed, it is quite often true that if a risk function $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ is convex, then its representation as a function $\tilde{\rho}:\mathcal{P}^{p}(\mathbb{R})\to\mathbb{R}$ , $\tilde{\rho}(\mathscr{L}(X))=\rho(X)$ for all $X\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ is concave [2] – this has obvious implications for minimization of such functionals. Secondly, we will in the following need some notion of differentiability of risk functions. Functional differentiation is more readily defined on the Banach spaces $\mathcal{L}^{p}(\Omega;\mathbb{R})$ , and moreover, an appropriate theory has recently been developed in the context of mean field games and McKean-Vlasov problems [19, 20]. In their context, differentiability with respect to distributions was needed for treating the mean field, i.e. distribution dependent terms in the model equations.

In the following, we will go back and forth between representations of a risk function as a mapping over random variables or measures. What we mean by this is formalized in the following definition:

Definition 3.2.

Let $(\Omega,\Sigma,\mathbb{P})$ be a probability space, $p\in[1,\infty]$ , and let $\mathbb{V}$ be a metric space. (i) Suppose $\phi\text{ : }\mathcal{L}^{p}(\Omega;\mathbb{V})\rightarrow\mathbb{R}$ is a law invariant mapping. A function $\psi:\mathcal{P}^{p}(\mathbb{V})\to\mathbb{R}$ is a $\mathcal{P}^{p}$ -representation of $\phi$ if $\phi(U)=\psi(\mathscr{L}(U))$ for all $U\in\mathcal{L}^{p}(\Omega;\mathbb{V})$ . (ii) Suppose $\psi:\mathcal{P}^{p}(\mathbb{V})\to\mathbb{R}$ . If there is a probability space $(\tilde{\Omega},\tilde{\Sigma},\tilde{\mathbb{P}})$ and a function $\phi\text{ : }\mathcal{L}^{p}(\tilde{\Omega};\mathbb{V})\rightarrow\mathbb{R}$ such that $\psi(\mathscr{L}(U))=\phi(U)$ for all $U\in\mathcal{L}^{p}(\tilde{\Omega};\mathbb{V})$ , then we say $\phi$ is an $\mathcal{L}^{p}(\tilde{\Omega})$ -representation of $\psi$ .

In order to impose more structure on the set of risk functions we consider, some of the following properties, frequently considered in the literature [5, 31], are assumed. These properties are more naturally defined for the $\mathcal{L}^{p}(\Omega)$ -representation of the risk function.

Definition 3.3.

Let $(\Omega,\Sigma,\mathbb{P})$ be a probability space and denote $\mathcal{L}\coloneqq\mathcal{L}^{p}(\Omega;\mathbb{R})$ , $p\in[1,\infty]$ . (i) Monotonicity: for all $X_{1},X_{2}\in\mathcal{L}$ such that $X_{1}\leq X_{2}$ almost surely, $\rho(X_{1})\leq\rho(X_{2})$ . (ii) Convexity: $\rho(\alpha X_{1}+(1-\alpha)X_{2})\leq\alpha\rho(X_{1})+(1-\alpha)\rho(X_{2})$ for all $X_{1},X_{2}\in\mathcal{L}$ and $\alpha\in[0,1]$ . (iii) Positive homogeneity: $\rho(aX)=a\rho(X)$ for all $a\geq 0$ and $X\in\mathcal{L}$ . (iv) Translation invariance: $\rho(X+a)=\rho(X)+a$ for all $a\in\mathbb{R}$ and $X\in\mathcal{L}$ . (v) If the risk function satisfies (i–iv), it is called coherent.

Differentiability of risk functions

We begin by recalling the following, standard definitions of functional derivatives.

Definition 3.4.

Let $\mathbb{V}$ and $\mathbb{U}$ be real Banach spaces. (i) For any $f:\mathbb{V}\to\mathbb{R}$ , the subdifferential of $f$ at $X\in\mathbb{V}$ , denoted $\text{$ \partial $}f(X)$ , is the set $\partial f(X)=\{Y\in\mathbb{V}^{\ast}:\langle Y,X^{\prime}-X\rangle\leq f(X^{\prime})-f(X),\,\forall X^{\prime}\in\mathbb{V}\}$ . The function is subdifferentiable at $X\in\mathbb{V}$ if $\partial f(X)\neq\emptyset$ . (ii) For any $F:\mathbb{V}\to\mathbb{U}$ , the directional derivative of $F$ at $X\in\mathbb{V}$ in direction $Y\in\mathbb{V}$ , denoted $\mathrm{D}_{Y}F(X)$ , is defined as

[TABLE]

if the limit exists. Further, we will say that the function $F$ is Gâteaux differentiable at $X\in\mathbb{V}$ if the above limit exists for all $Y\in\mathbb{V}$ and if the mapping $Y\to\mathrm{D}_{Y}F(X)$ is linear. (iii) The function $F:\mathbb{V}\to\mathbb{U}$ is Fréchet differentiable at $X\in\mathbb{V}$ if there is a continuous linear operator $\mathrm{D}F(X)\in\mathbb{V}^{\ast}$ , the Fréchet derivative, such that

[TABLE]

where $\|\cdot\|_{\mathbb{V}}$ denotes the norm on $\mathbb{V}$ .

*Remark 3.5**.*

A coherent risk function $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ that is nonlinear cannot be everywhere Gâteaux or Fréchet differentiable. Specifically, $\rho$ cannot be Fréchet differentiable at $X=0$ , if positive homogeneity, Definition 3.3(iii), holds. Then $\mathrm{D}_{Y}\rho(0)=\mathrm{d}[\rho(\epsilon Y)]/\mathrm{d}\epsilon|_{\epsilon=0}=\rho(Y)$ , which is not linear; this was earlier pointed out in [28, Proposition 3.1]. Relaxing the assumption that the Gâteaux derivative must be linear would remove the issue, but everywhere Fréchet differentiability of $\rho$ is nonetheless not possible. In Section 5 we demonstrate, by way of an example, that a risk function can be shown to be differentiable at the cost random variable $C^{\pi}$ . We also show that e.g. the entropic risk measure, frequently encountered in the literature, is everywhere Fréchet differentiable, and that additionally, it may be possible to approximate a risk function with another, everywhere Fréchet differentiable functional.

We can now define a useful notion of a derivative of a law invariant risk function. Here, we use the definition used in e.g. [18, 19, 20], which we extend slightly to cover $\mathcal{L}^{p}(\Omega;\mathbb{R})$ spaces, $p\in[1,\infty]$ , not just $\mathcal{L}^{2}(\Omega;\mathbb{R})$ . The definition relies on the Fréchet differentiability of the $\mathcal{L}^{p}(\Omega)$ -representation of the function.

Definition 3.6.

Let $\phi:\mathcal{P}^{p}(\mathbb{R}^{n})\to\mathbb{R}$ , $n\in\mathbb{N}$ , $p\in[1,\infty]$ , and suppose there is a probability space $(\Omega,\Sigma,\mathbb{P})$ and an $\mathcal{L}^{p}(\Omega$ )-representation of $\phi$ , denoted $\psi$ . (i) We say the function $\phi$ is $\mathcal{L}$ -differentiable at $\mu_{0}\in\mathcal{P}^{p}(\mathbb{R}^{n})$ if its $\mathcal{L}^{p}(\Omega$ )-representation $\psi$ is Fréchet differentiable at any point $U_{0}\in\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})$ such that $\mathscr{L}(U_{0})=\mu_{0}$ .

(ii) The function $\phi$ is continuously $\mathcal{L}$ -differentiable, if the Fréchet derivative of $\psi$ as seen as a function $\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})\ni X\to\mathrm{D}\psi(X)\in\mathcal{L}^{q}(\Omega;\mathbb{R}^{n})$ , $q=p/(p-1)$ , is continuous.

(iii) Given $\mu\in\mathcal{P}^{p}(\mathbb{R}^{n})$ , we say the function $f:\mathbb{R}^{n}\to\mathbb{R}^{1\times n}$ is an * $\mathcal{L}$ -derivative *of $\phi$ at $\mu$ , if the Fréchet derivative of $\psi$ , $\mathrm{D}\psi(X)\in\mathcal{L}^{q}(\Omega;\mathbb{R}^{n})$ is such that $\mathrm{D}\psi(X)(\omega)=f(X(\omega))$ for $\mathbb{P}$ -almost all $\omega\in\Omega$ , implying that $\langle\mathrm{D}\psi(X),Y\rangle=\langle f(X),Y\rangle$ for all $X,Y\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ such that $\mathscr{L}(X)=\mu$ . We will denote a representative $\mathcal{L}$ -derivative by $\mathrm{D}\phi(\mu)$ .

We have the following result concerning the existence of $\mathcal{L}$ -derivatives. It demonstrates that $\mathcal{L}$ -derivatives commonly exist, and are not limited to exceptional cases of risk functions.

Proposition 3.7.

Suppose $\phi:\mathcal{P}^{p}(\mathbb{R}^{n})\to\mathbb{R}$ , $n\in\mathbb{N}$ , $p\in[2,\infty]$ , is continuously $\mathcal{L}$ -differentiable. Then an $\mathcal{L}$ -derivative exists, and is unique in the sense that if $f_{1}$ and $f_{2}$ are $\mathcal{L}$ -derivatives at $\mu\in\mathcal{P}^{p}(\mathbb{R}^{n})$ , then $f_{1}(x)=f_{2}(x)$ for $\mu$ -almost every $x\in\mathbb{R}^{n}$ .

Our main use of the $\mathcal{L}$ -derivative is in evaluating the first-order response of functions of probability measures. If $\mu,\mu_{0}\in\mathcal{P}^{p}(\mathbb{R}^{n})$ , then for any random variables $U,U_{0}\in\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})$ , $p\in[1,\infty]$ , whose laws equal $\mu$ and $\mu_{0}$ , respectively, we get the following expansion directly from the definitions of the Fréchet- and $\mathcal{L}$ -derivatives:

[TABLE]

where $\mathbb{E}[\mathrm{D}\phi(\mu_{0})(U_{0})(U-U_{0})]=\langle\mathrm{D}\phi(\mu_{0})(U_{0}),U-U_{0}\rangle$ .

*Remark 3.8**.*

The need for the notion of $\mathcal{L}$ -differentiability ultimately arises from the use of vague controls and weak solutions of the stochastic differential equations, that is, vague controlled solutions in the sense of Definition 2.1. If it were possible to always consider a fixed probability space, there would not be a need for the notion of $\mathcal{L}$ -differentiability, and we could instead solely use the Fréchet derivative on a fixed $\mathcal{L}^{p}(\Omega;\mathbb{R})$ space to construct the first order responses of the form given in Eq. (3.2).

Differentiability can subsequently be used to construct a notion of convexity of a real-valued function of probability measures $\mu\in\mathcal{P}^{p}(\mathbb{V})$ , where $\mathbb{V}$ is a Banach space and $p\in[1,\infty]$ , without needing to impose vector space structure on $\mathcal{P}^{p}(\mathbb{V})$ .

Definition 3.9.

An $\mathcal{L}$ -differentiable function $\phi:\mathcal{P}^{p}(\mathbb{R}^{n})\to\mathbb{R}_{\infty}$ , $n\in\mathbb{N}$ , $p\in[1,\infty]$ , with the $\mathcal{L}$ -derivative $\mathrm{D}\phi:\mathcal{P}^{p}(\mathbb{R}^{n})\times\mathbb{R}^{n}\to\mathbb{R}$ is $\mathcal{L}$ -convex if

[TABLE]

and where $U,U^{\prime}\in\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})$ are any random variables over some probability space $(\Omega,\Sigma,\mathbb{P})$ such that $\mathscr{L}(U)=\mu$ and $\mathscr{L}(U^{\prime})=\mu^{\prime}$ .

It is straight-forward to verify that for a law invariant, $\mathcal{L}$ -differentiable risk function $\rho\in\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ with an $\mathcal{L}$ -derivative $\mathrm{D}\rho(\cdot)(\cdot):\mathcal{P}^{p}(\mathbb{R})\times\mathbb{R}\to\mathbb{R}$ , convexity in the sense of Definition 3.3(ii) implies $\mathcal{L}$ -convexity, i.e. the notion of convexity in Definition 3.9.

For brevity of notations, in the following we will for all law-invariant functions $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})\to\mathbb{R}$ , $p\in[1,\infty]$ , denote its $\mathcal{P}^{p}$ -representation by the same symbol $\rho$ – which function we mean will always be clear from its arguments. With the above definitions, we can now state the necessary assumptions regarding the risk functions we consider.

Assumption 3.10.

The risk function $\rho$ is such that: (i) $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty)$ , and $\rho$ is law invariant. (ii) The risk function $\rho$ has an $\mathcal{L}$ -derivative on some open subset $\mathcal{P}^{\prime}\subset\mathcal{P}^{p}(\mathbb{R})$ . (iii) The law of the cost functional is in $\mathcal{P}^{\prime}$ , i.e. $\mathscr{L}(C^{\pi})\in\mathcal{P}^{\prime}$ for all $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ .

These assumptions simply assert that $\rho$ is differentiable over a sufficiently large set of random variables. Note that $\mathcal{L}$ -convexity is not yet assumed; it will be needed when we state conditions that are sufficient for optimality of controls.

The probabilistic formulation of the risk aware problem relies on Assumption 3.10, and in particular on the existence of Fréchet derivatives of the risk function. In general, the question of the Fréchet differentiability of a function defined over an infinite dimensional Banach space is a rather complicated one, and presently, there does not appear to be a well-developed theory of Fréchet differentiability of risk functions. When the underlying Banach spaces are Asplund spaces, say in particular if we consider the Hilbert space $\mathcal{L}^{2}(\Omega;\mathbb{R})$ , Fréchet differentiability is guaranteed at least over a dense $G_{\delta}$ -subset of the space, see e.g. [65]. This is somewhat unsatisfactory, since here we would like to be able to say whether or not a risk function is differentiable at a specific random variable we have in mind. A broad treatment of the differentiability of risk functions is beyond the scope of this paper, however, in Section 5 we present examples of non-trivial risk functions (i.e. risk functions that are not simply the expectation) that are in fact differentiable over a sufficiently large subset of the space.

*Remark 3.11**.*

While the question of the existence of Fréchet derivatives is open, more is known about the Gâteaux differentiability of risk functions. For any risk function $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty]$ , we denote $\mathop{\mathrm{dom}}\rho\coloneqq\{X\in\mathcal{L}^{p}(\Omega;\mathbb{R}):\rho(X)<\infty\}$ , and we say $\rho$ is proper if $\mathop{\mathrm{dom}}\rho\neq\emptyset$ . We then have that a proper, coherent risk function is continuous and subdifferentiable in the interior of its domain [67]. In addition, if $\rho$ is continuous at $X\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ and the subdifferential is a singleton, then $\rho$ is Gâteaux differentiable at $X$ , and the mapping $\mathcal{L}^{p}\ni Y\to\mathrm{D}_{Y}\rho(X)$ is continuous (in fact $\rho$ is differentiable in the somewhat stronger sense of Hadamard [16]). The Gâteaux differentiability of distortion risk measures was shown in [55]. Although excluded from the published version, the Fréchet differentiability was also discussed in an earlier working paper version of the work, see [54]. We refer the reader to the work [50] for recent advances in Fréchet differentiability of convex Lipschitz functions, such as coherent risk functions over $\mathcal{L}^{\infty}(\Omega;\mathbb{R})$ .

4 Risk aware minimum principle

Main results

We begin by stating our risk aware generalization of the stochastic Pontryagin’s minimum principle for Problem $\mathscr{P}_{1}$ . We denote $\mathbb{Y}\coloneqq\mathbb{R}^{1\times d_{x}},$ $\mathbb{Y}^{\prime}\coloneqq\mathbb{R}$ , and $\mathbb{Z}\coloneqq\mathbb{R}^{d_{w}\times d_{x}}$ , and define the Hamiltonian $H$ as

[TABLE]

Note that compared to the risk neutral case, the term involving the cost rate function has been modified to feature an additional adjoint variable $y^{\prime}\in\mathbb{Y}^{\prime}$ . We shall elaborate on this significant point later in more detail. We will give both necessary and sufficient conditions for the $\mathscr{P}_{1}$ -optimality of a control $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ . For sufficiency, we need an additional convexity assumption.

Assumption 4.1.

Suppose Assumptions 2.5 and 3.10 hold, and that additionally: (i) the functions $x\to g(x)$ and

[TABLE]

are convex for all $(t,y,y^{\prime},z)\in\mathbb{T}\times\mathbb{Y}\times\mathbb{Y}^{\prime}\times\mathbb{Z}$ , and (ii) the risk function $\rho$ is $\mathcal{L}$ -convex.

The risk aware minimum principle can then be stated as follows.

Theorem 4.2 (Risk aware minimum principle).

(i) Suppose $\rho:\mathcal{P}^{p}(\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty)$ , satisfies Assumption 3.10. If $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ is $\mathscr{P}_{1}$ -optimal, then there exists unique $\mathcal{F}$ -adapted continuous processes $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ and $y^{\prime\,\pi}\in\mathcal{S}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Y}^{\prime})$ , and a unique $\mathcal{F}$ -predictable $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ that satisfy the backward stochastic differential equation

[TABLE]

and the representation

[TABLE]

Moreover, the Hamiltonian of Eq. (4.1) is optimized in the sense that

[TABLE]

(ii) Suppose the stronger assumption, Assumption 4.1 holds. If $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , and if there exist processes $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ , $y^{\prime\,\pi}\in\mathcal{S}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Y}^{\prime})$ , $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ satisfying Eqs. (4.2, 4.3, 4.4), then $\pi$ is $\mathscr{P}_{1}$ -optimal.

*Remark 4.3**.*

In some previous works on risk aware optimization utilizing generic risk functions [37, 41, 42], a state space augmentation scheme was used to derive a computationally viable form of the risk aware problem. However here, no augmentation is necessary. This is in contrast to the earlier result where the state space augmentation was an inextricable part of the end results. It should also be emphasized that these earlier papers focused on a convex analytic formulation of the problem whereas here, we consider a purely probabilistic approach.

*Remark 4.4**.*

It has also not escaped us that the Clark-Ocone theorem, c.f. [22, Theorem 4.1], may be further used to characterize the process $(z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ in terms of the Malliavin derivatives of $\mathrm{D}\rho(\mathscr{L}(C^{\pi}))(C^{\pi})$ . However, we leave the exploration of this connection to future work.

Intuitively, the risk aware minimum principle can be seen as a modification of the risk neutral Pontryagin’s minimum principle: Going from the risk neutral to the risk aware case, an additional process $(y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ is introduced which acts as a rescaling or adjustment factor for given cost rate and terminal cost functions $c$ and $g$ . Moreover, as per Eq. (4.3), the values $y_{t}^{\prime\,\pi}$ , $t\in\mathbb{T}$ of the process represent the controller’s time $t\in\mathbb{T}$ expectation of the derivative of the risk function evaluated at the total cost $C^{\pi}$ . Indeed, if $\rho$ is the expectation, the risk neutral minimum principle (see e.g. [8, Section 3.2]) is recovered with the process $(y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ disappearing in a natural way.

Corollary 4.5.

Suppose that the assumptions of Theorem 4.2 hold, and additionally, $p=1$ and $\rho$ is the expectation. Then the statement of the theorem holds, with the Hamiltonian $H$ replaced by

[TABLE]

and where the process $(y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ is constant, $y_{t}^{\prime\,\pi}=1$ for all $t\in\mathbb{T}$ .

Proof.

Follows from Theorem 4.2 due to the fact that if $\rho=\mathbb{E}$ , then the $\mathcal{L}$ -derivative $\mathrm{D}\rho(\cdot)(\cdot)$ is identically one, and by Eq. (4.3), we have $y_{t}^{\prime\,\pi}=1$ for all $t\in\mathbb{T}$ . Thus, we may also set $y^{\prime}=1$ in the definition of the Hamiltonian $H$ , Eq. (4.1) to recover the risk neutral minimum principle. ∎

*Remark 4.6**.*

The process $(y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ in the statement of Theorem 4.2 also satisfies a backward stochastic differential equation that is obtained in an intermediate step when proving the minimum principle. Specifically, there is a unique $\mathcal{F}$ -predictable process $z^{\prime}\in\mathcal{H}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Z}^{\prime})$ , $\mathbb{Z}^{\prime}\coloneqq\mathbb{R}^{d_{w}\times 1}$ , such that

[TABLE]

Therefore together, Eqs. (2.1), (4.2), and (4.6) form a forward-backward system of stochastic differential equations with $d_{x}$ and $d_{x}+1$ state and adjoint state variables, respectively.

*Remark 4.7**.*

Returning to Example 2.3, and setting $c=0$ and $g(x)=x^{2}/2$ for all $x\in\mathbb{X}$ , and $\rho=\mathbb{E}$ , we can easily see that Assumption 4.1 is satisfied. The Hamiltonian becomes $H(z,a)=az$ , and by Eq. (4.4), $\mathbb{P}$ -almost surely for almost all $t\in\mathbb{T}$ , $\pi_{t}=\delta_{+1}$ if $z_{t}^{\pi}<0$ and $\pi_{t}=\delta_{-1}$ if $z_{t}>0$ . However, the minimization of the Hamiltonian does not determine $\pi_{t}$ when $z_{t}=0$ . At first glance this would seem to imply that the conditions given in Theorem 4.2 are not in fact sufficient to fix the optimal control. But since $\mathrm{d}x_{t}^{\pi}=\int_{\mathbb{A}}a\pi_{t}(\mathrm{d}a)\,\mathrm{d}w_{t}$ , and $\mathrm{d}y_{t}^{\pi}=z_{t}^{\pi}\mathrm{d}w_{t}$ , $y_{T}=x_{T}$ , by the uniqueness of the solutions we must have that $y_{t}=x_{t}$ and $z_{t}=\int_{\mathbb{A}}a\pi_{t}(\mathrm{d}a)\,\mathrm{d}w_{t}$ for all $t\in\mathbb{T}$ , which then implies $z_{t}=-z_{t}$ and in turn that $z_{t}=0$ and $\pi_{t}=(\delta_{+1}+\delta_{-1})/2$ for all $t\in\mathbb{T}$ . Therefore in order to find the optimal control it may be insufficient to only minimize the Hamiltonian, and instead one needs to determine the adjoint processes as well.

Proofs of the main results

The rest of this section is dedicated to proving the risk aware minimum principle, Theorem 4.2. We present our intermediate steps in reaching the main result, but defer the details of their proofs to Appendix A.3.

We adopt the following short-hand: For every Borel measurable function $f:\mathbb{A}\to\mathbb{V}$ and every $\pi_{1},\pi_{2}\in\mathcal{P}(\mathbb{A})$ and $a_{1},a_{2}\in\mathbb{R}$ , we denote

[TABLE]

In addition to the original stochastic differential equation, Eq. (2.1) describing a controlled process $(x_{t}^{\pi})_{t\in\mathbb{T}}$ , we introduce the additional, coupled differential equation for a $\mathbb{R}$ -valued process, the running costs, $x^{\prime}=(x_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ defined as

[TABLE]

We can then re-write the total cost as

[TABLE]

We shall use $\mathbb{X}^{\prime}=\mathbb{R}$ to indicate the range of the process $x^{\prime}$ .

In order to establish optimality conditions for vague controlled solutions, we need first and foremost some means of comparing pairs of solutions. This poses a slight technical challenge, as the state and control space processes for any given pair $\pi,\pi^{\prime}\in\mathfrak{V}^{p}(b,\sigma,\nu)$ may be defined on different probability spaces. A natural way of comparing weak solutions of stochastic differential equations would be to compare the finite dimensional distributions of the state space processes (and the distributions of the cost variables $C^{\pi}$ ). However, since by Assumptions 2.5 and Proposition 2.7, strong solutions exists for given filtered probability spaces and control process, we can do slightly better. Specifically, we can construct an extended probability space simultaneously supporting both vague controlled solutions, and on which we can compare the pathwise laws of the solutions.

Lemma 4.8.

Suppose Assumption 2.5 holds, and that $(\Omega,\Sigma,\mathcal{F},\mathbb{P},x,\pi)\in\mathfrak{V}^{p}(b,\sigma,\nu)$ and $(\Omega^{\prime},\Sigma^{\prime},\mathcal{F}^{\prime},\mathbb{P}^{\prime},x^{\prime},\pi^{\prime})\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , and let $w=(w_{t})_{t\in\mathbb{T}}$ and $w^{\prime}=(w_{t}^{\prime})_{t\in\mathbb{T}}$ be the corresponding Brownian motions. Then there exists a filtered probability space $(\tilde{\Omega},\tilde{\Sigma},\tilde{\mathcal{F}},\tilde{\mathbb{P}})$ supporting an $\tilde{\mathcal{F}}$ -Brownian motion $\tilde{w}=(\tilde{w}_{t})_{t\in\mathbb{T}}$ , and processes $(\tilde{x}_{t},\tilde{\pi}_{t})_{t\in\mathbb{T}}$ and $(\tilde{x}_{t}^{\prime},\tilde{\pi}_{t}^{\prime})_{t\in\mathbb{T}}$ satisfying

[TABLE]

for all $t\in\mathbb{T}$ , and such that their laws equal those of $(x_{t},\pi_{t})_{t\in\mathbb{T}}$ and $(x_{t}^{\prime},\pi_{t}^{\prime})_{t\in\mathbb{T}}$ .

For simplicity, we shall implicitly suppose that pairs of vague controlled solutions are defined on the same probability space. In addition, for every pair of vague controlled solutions $\pi,q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , the convex combination of their control processes shall be denoted by the short-hand $\pi(\alpha,q)$ , that is, for all $\pi$ , $q$ and $\alpha\in[0,1]$ ,

[TABLE]

We will the control $q$ as a perturbation of the original, reference control $\pi$ , and our goal is to derive optimality conditions from variational equations representing the response of the solution to $q$ .

We begin with a few auxiliary results, variations of which have appeared in the literature. The following lemma states that solutions corresponding to perturbed controls are, uniformly in time, good approximations of the unperturbed solutions.

Lemma 4.9.

For all $\pi,q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ and $\alpha\in[0,1]$ ,

[TABLE]

where $\bar{p}$ is as in Assumption 2.5. In addition, we have for the terminal cost

[TABLE]

The following lemma provides the means for computing the first-order response of solutions to perturbations of the control process.

Lemma 4.10.

Let $\pi,q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ be arbitrary. Then there exists an $\mathbb{X}$ -valued process $\delta^{\pi,q}=(\delta_{t}^{\pi,q})_{t\in\mathbb{T}}$ that is the unique strong solution of

[TABLE]

Moreover, defining $\delta^{\prime\,\pi,q}=(\delta_{t}^{\prime\,\pi,q})_{t\in\mathbb{T}}$ as

[TABLE]

we have

[TABLE]

and, for all $\alpha\in[0,1]$ ,

[TABLE]

The next results connect the response of the dynamics to the perturbation, described by the process $(\delta^{\pi,q},\delta^{\prime\,\pi,q})$ and characterized by the above lemmas, to the risk aware objectives. Let us for brevity denote for all $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ and for any law invariant risk function $\rho:\mathcal{P}^{p}(\mathbb{R})\to\mathbb{R}$ , with an $\mathcal{L}$ -derivative $\mathrm{D}\rho(\cdot)(\cdot):\mathcal{P}^{p}(\mathbb{R})\times\mathbb{R}\to\mathbb{R}$

[TABLE]

where

[TABLE]

If $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ is $\mathscr{P}_{1}$ -optimal, then by definition for any $q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ and $\alpha\in[0,1]$ we have that

[TABLE]

We will use Eq. (4.18) as a starting point for deriving our optimality conditions.

Lemma 4.11.

Suppose Assumption 3.10 holds. Let $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ be $\mathscr{P}_{1}$ -optimal and $q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ arbitrary, and let the process $(\delta_{t}^{\pi,q},\delta_{t}^{\prime\,\pi,q})_{t\in\mathbb{T}}$ be as in the statement of Lemma 4.10. Then

[TABLE]

where $D^{\pi}$ is as defined in Eq. (4.17).

We can now construct the adjoint processes $(y_{t}^{\pi},y_{t}^{\prime\,\pi},z_{t}^{\pi},z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ appearing in Eqs. (4.2, 4.6), and use them to restate the optimality condition of Eq. (4.19). The proof of the lemma follows roughly the same ideas as used in the risk neutral case, see e.g. [13, 14, 11, 59], and relies primarily on the martingale representation theorem. In the risk aware case, we need to additionally handle the nonlinearity of the risk aware objective, which gives rise to the risk adjustment process.

Lemma 4.12.

Suppose Assumptions 3.10 hold, and that

[TABLE]

is $\mathscr{P}_{1}$ -optimal. Then there are unique $\mathcal{F}$ -adapted continuous processes $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ and $y^{\prime\,\pi}\in\mathcal{S}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Y}^{\prime})$ , and unique $\mathcal{F}$ -predictable processes $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ and $z^{\prime\,\pi}\in\mathcal{H}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Z}^{\prime})$ satisfying the backward stochastic differential equations

[TABLE]

where $H$ is as defined in Eq. (4.1), and Eq. (4.19) implies that for all $q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ ,

[TABLE]

Finally, we show that existence of the adjoints and minimization of the Hamiltonian is indeed sufficient to establish optimality.

Lemma 4.13.

Suppose Assumption 4.1 holds, $\pi\in\mathfrak{V}^{p}(b,\sigma,\nu)$ , and there exists processes $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ , $y^{\prime\,\pi}\in\mathcal{S}_{\mathcal{F}}^{p/(p-1)}(\Omega;\mathbb{Y}^{\prime})$ , $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ satisfying Eqs. (4.2, 4.3, 4.4). Then

[TABLE]

for every $q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ .

We can now collect the above together and give the proof of our main result, Theorem 4.2.

Proof of Theorem 4.2.

The first part of the theorem now follow directly from Lemma 4.12 and Eq. (4.21), while the second is a direct consequence of Lemma 4.13. The representation of Eq. (4.6) follows directly from Lemma 4.12. ∎

5 Examples of differentiable risk functions and

a portfolio allocation problem

The purpose of this section is to present an application of the results of previous sections, and hence the problem we consider is selected for simplicity while attempting to retain a reasonable degree of practical significance.

Risk functions

As examples of law invariant risk functions, we use the mean-deviation, the (smoothed) mean-semideviation, and entropic risk functionals.

Definition 5.1.

Let $(\Omega,\Sigma,\mathbb{P})$ be a probability space. (i) Mean-deviation risk function $\rho^{\mathrm{MD}}:\mathcal{L}^{2}(\Omega;\mathbb{R})\to\mathbb{R}$ is defined as the mapping

[TABLE]

where $\beta>0$ . (ii) Mean-semideviation risk function $\rho^{\mathrm{MD+}}:\mathcal{L}^{1}(\Omega;\mathbb{R})\to\mathbb{R}$ and the $\epsilon$ -smoothed mean-semideviation risk function $\rho_{\epsilon}^{\mathrm{MD+}}:\mathcal{L}^{1}(\Omega;\mathbb{R})\to\mathbb{R}$ , $\epsilon>0$ , are defined as

[TABLE]

where $(\cdot)_{+}:\mathbb{R}\to\mathbb{R}_{\geq 0}$ and $(\cdot)_{\epsilon+}:\mathbb{R}\to\mathbb{R}_{>0}$ are the positive part and $\epsilon$ -smoothed positive part functions, $(x)_{+}\coloneqq x\vee 0$ and $(x)_{\epsilon+}\coloneqq x+\epsilon\ln(1+\mathrm{e}^{-x/\epsilon})$ for all $x\in\mathbb{R}$ and $\epsilon>0$ . (iii) Entropic risk function is the risk measure $\rho^{\mathrm{Ent}}:\mathcal{L}^{\infty}(\Omega;\mathbb{R})\to\mathbb{R}$ defined as

[TABLE]

where $\theta>0$ .

We note that the mean-deviation risk function is convex, positively homogeneous, and translation invariant, that is, it satisfies Definition 3.3 items (ii), (iii), and (iv). The $\mathcal{L}^{1}(\Omega;\mathbb{R})$ mean-semideviation risk measure $\rho^{\mathrm{MD+}}$ was considered in e.g. [67], and it too is convex, positively homogeneous, and translation invariant, but is additionally monotonic, satisfying Definition 3.3(i). As noted in Remark 3.5, the positive homogeneity of these functionals implies that they cannot be everywhere Fréchet differentiable. We demonstrate in the example problem below that this is not necessarily an issue for our purposes. Moreover, the $\epsilon$ -smoothed mean-semideviation risk function $\rho_{\epsilon}^{\mathrm{MD+}}$ uniformly approximates $\rho^{\mathrm{MD+}}$ , that is,

[TABLE]

but its restriction to $\mathcal{L}^{2}(\Omega;\mathbb{R})$ is in fact everywhere Fréchet differentiable (this will be established in Lemma 5.2 below). The smoothed mean-semideviation is also convex and monotonic which, along with the above estimate, follows directly from the properties444Specifically, from the inequality $0<(x)_{\epsilon+}-(x)_{+}\leq\epsilon\ln 2\,\forall x\in\mathbb{R}$ , and the monotonicity and convexity of $(\cdot)_{\epsilon+}$ . of the $\epsilon$ -smoothed positive part function [21]. Our definition of $\rho_{\epsilon}^{\mathrm{MD+}}$ was inspired by the construction of a smoothed conditional value-at-risk risk functional in [44]. The entropic risk function $\rho^{\mathrm{Ent}}$ on the other hand satisfies monotonicity, convexity, and translation invariance properties, or items (i), (ii) and (iv) of Definition 3.3. It serves as an example of a commonly used risk function that is everywhere Fréchet differentiable.

Lemma 5.2.

(i) The mean-deviation risk function is Fréchet differentiable at every $X\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ that is not almost surely constant, with the derivative $\mathrm{D}\rho^{\mathrm{MD}}(X)\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ being

[TABLE]

Moreover, the derivative does not exist at $X\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ such that $X=\mathbb{E}[X]$ . It additionally has the $\mathcal{L}$ -derivative $\mathrm{D}\rho^{\mathrm{MD}}:\mathcal{P}^{2}(\mathbb{R})\times\mathbb{R}\to\mathbb{R}$ that reads, for all $\mu\in\mathcal{P}^{2}(\mathbb{R})$ that are not a Dirac measures,

[TABLE]

(ii) The $\mathcal{L}^{2}(\Omega;\mathbb{R})$ -restriction of the $\epsilon$ -smoothed mean-semideviation risk function $\rho_{\epsilon}^{\mathrm{MD+}}$ , $\epsilon>0$ , is Fréchet differentiable at every $X\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ , and has the Fréchet- and $\mathcal{L}$ -derivatives

[TABLE]

respectively, and where $U_{\epsilon}(x)\coloneqq\mathrm{d}(x)_{\epsilon+}/\mathrm{d}x=1/(1+\mathrm{e}^{-x/\epsilon})$ for all $x\in\mathbb{R}$ .

(iii) The entropic risk measure is Fréchet differentiable at every $X\in\mathcal{L}^{\infty}(\Omega;\mathbb{R})$ , with the Fréchet- and $\mathcal{L}$ -derivatives $\mathrm{D}\rho^{\mathrm{Ent}}(X)\in\mathcal{L}^{1}(\Omega;\mathbb{R})$ and $\mathrm{D}\rho^{\mathrm{Ent}}(\mu)\in\mathbb{R}\to\mathbb{R}$ , $\mu\in\mathcal{P}^{\infty}(\mathbb{R})$ ,

[TABLE]

*Remark 5.3**.*

If the $\mathcal{L}_{2}$ -norm $\|\cdot\|_{2}$ in Eq. (5.1) is replaced by its square, it is easy to verify that the resulting risk function is everywhere Fréchet differentiable.

Portfolio allocation problem

As a practical example, we consider a simplified portfolio allocation problem. An agent manages a portfolio consisting of a risk free bond, yielding a constant return rate $r>0$ , and a risky stock whose price $(q_{t})_{t\in\mathbb{T}}$ evolves according to $\mathrm{d}q_{t}=\mu q_{t}\,\mathrm{d}t+\sigma q_{t}\,\mathrm{d}w_{t}$ , $q_{0}=1$ , $\mu>0$ , $\sigma>0$ . Let $N_{t}=B_{t}+q_{t}S_{t}$ be the net value of the agent’s portfolio where $B_{t}$ and $S_{t}$ represent the agent’s bond and stock holdings at any $t\in\mathbb{T}$ , respectively. Let $\phi_{t}\coloneqq q_{t}S_{t}/N_{t}$ be the proportion of the agent’s portfolio allocated to the risky asset, so that $N_{t}$ follows the stochastic differential equation

[TABLE]

with a given initial condition $N_{0}$ . Trading is costless and unconstrained so that $\phi_{t}$ is a choice variable for each $t\in\mathbb{T}$ . We suppose $\phi_{t}$ is constrained to the interval $\mathbb{A}=[\underline{\phi},\bar{\phi}]$ where $0<\underline{\phi}<\text{$ \bar{\phi} $}<\infty$ , the agent optimizes the allocation so that the risk of the utility of $N_{T}$ is minimized. Here, the agent values their profits or losses using a logarithmic utility, so that their total cost evaluates to $-\ln N_{T}$ .

Re-writing Eq. (5.6) for the logarithm of $N_{t}$ , $x_{t}^{\pi}\coloneqq\ln N_{t}$ for all $t\in\mathbb{T}$ , and generalizing to a relaxed controlled process, we have that

[TABLE]

where $x_{0}^{\pi}=x_{0}\in\mathbb{R}$ is given. Let $b_{\phi}$ and $\sigma_{\phi}$ be the drift and diffusion coefficients of Eq. (5.7), and let $\nu_{\phi}=\delta_{x_{0}}$ . Assumption 2.5 is now satisfied, with $\bar{p}_{1}=0$ , $\bar{p}_{2}=0$ , $\bar{p}_{3}=\infty$ , $p_{1}=1$ , $p_{2}=0$ , $p_{1}^{\prime}=0$ , and $p_{2}^{\prime}=0$ . Since the initial condition is deterministic, $\bar{p}$ may be selected to be arbitrarily large. It is easy to verify that Eq. (2.11) holds for any $p\in[1,\infty)$ , so that we may consider $p$ -feasible solutions $\pi\in\mathfrak{V}^{p}(b_{\phi},\sigma_{\phi},\nu_{\phi})$ .

The risk aware control problem, Problem $\mathscr{P}_{\phi}$ , becomes

[TABLE]

We note that for instance the mean-deviation risk function of Eq. (5.1) is $\mathcal{L}$ -differentiable at $-x_{T}^{\pi}$ .

Proposition 5.4.

There is no $\pi\in\mathfrak{V}^{p}(b_{\phi},\sigma_{\phi},\nu_{\phi})$ such that $-x_{T}^{\pi}$ is almost surely bounded.

Proof.

Since for any $\pi\in\mathfrak{V}^{p}(b_{\phi},\sigma_{\phi},\nu_{\phi})$ the drift and diffusion are bounded, and the latter is always non-zero, $x_{T}^{\pi}$ can take arbitrarily large values. ∎

Since $-x_{T}^{\pi}$ is not bounded, it cannot be constant, and therefore $\rho^{\mathrm{MD}}$ is $\mathcal{L}$ -differentiable at the terminal cost. In addition, the e.g. the mean-deviation risk function or the $\mathcal{L}^{2}(\Omega;\mathbb{R})$ restriction of the $\epsilon$ -smoothed mean-semideviation risk function together with the cost $-x_{T}^{\pi}$ satisfy Assumption 3.10.

We can now use the risk aware minimum principle to characterize an optimal allocation process. For simplicity, we assume that the $\mathcal{L}$ -derivative of the risk function is positive (this is the case for e.g. the $\epsilon$ -smoothed mean-semideviation when $\beta<1$ ). Non-positive values of the derivative can also be easily accommodated, but the added complexity would detract from the intuition of this example, which is to illustrate how risk awareness can manifest itself in real world applications.

Proposition 5.5.

Suppose $\rho:\mathcal{L}^{p}(\Omega;\mathbb{R})\to\mathbb{R}$ , $p\in[1,\infty)$ , is convex, satisfies Assumption 3.10, and has a positive $\mathcal{L}$ -derivative, i.e. $\mathrm{D}\rho(\mu)(x)>0$ for all $\mu\in\mathcal{P}^{p}(\mathbb{R})$ , $x\in\mathbb{R}$ . The optimal portfolio allocation for Problem $\mathscr{P}_{\phi}$ is a strict control $\pi\in\mathfrak{V}^{p}(b_{\phi},\sigma_{\phi},\nu_{\phi})$ such that $\pi_{t}=\delta_{\phi_{t}}$ for all $t\in\mathbb{T}$ where

[TABLE]

and where

[TABLE]

is a risk premium in which

[TABLE]

for all $t\in\mathbb{T}$ .

We note that interestingly, the risk awareness of the objective function has now given rise to the additional risk premium process $(\iota_{t})_{t\in\mathbb{T}}$ defined in Eq. (5.8). To wit, the risk premium vanishes if $\rho$ is the expectation, since then as noted in Corollary 4.5, $y_{t}^{\prime\,\pi}=1$ for all $t\in\mathbb{T}$ implying that $z_{t}^{\prime\,\pi}=0$ for all $t\in\mathbb{T}$ . Thus, the risk aware minimum principle may open new possibilities in e.g. risk pricing theory.

6 Conclusions

In Theorem 4.2 we have given a risk aware generalization of the stochastic minimum principle. A notable feature of the result is the way risk is captured via the risk adjustment process, essentially the marginal risk at a given time $t\in\mathbb{T}$ , Eq. (4.3). We argue that at least some form of a risk adjustment process is an inevitable consequence of the risk awareness, or effectively of the nonlinearity of the risk function. In our risk aware context, it is natural to expect that the optimal control should account for changes in the way the risk responds to changes in the terminal cost, given the information $\mathcal{F}_{t}$ at any time $t\in\mathbb{T}$ . Indeed, the raison d’etre of dynamic risk measures is their property of time-consistency which prescribes the dependence of the risk function on the filtration. In the result we obtained, this risk accounting is represented by the $\mathcal{F}_{t}$ -conditional expectation of the $\mathcal{L}$ -derivative of the risk function evaluated at the terminal cost.

Although by not requiring that the risk functions are time-consistent we have provided a rather general version of a risk aware minimum principle, we have on the other hand opened ourselves to the possibility that the optimal controls might not be time-consistent. By this we mean that if the optimization problem were restarted at some time $t>0$ , the optimal value and control might change, and the controller could be better off by switching to a different control policy. However, since our risk aware minimum principle characterizes the optimal control in terms of the risk adjustment process, it is now possible to find new, sufficient conditions for time consistency of the controls. Moreover, it may now also be possible to consider constrained optimal control problems, where the purpose of the constraint is to enforce time-consistency of optimal controls.

Our minimum principle also gives, up to the knowledge of the authors, the first characterization of the risk aware optimal control that can be used to derive conditions under which an optimal control is strict or Markov. A simple application of Jensen’s inequality was used in the example problem to show that in that instance, a strict optimal control exists. Generalizations of this statement are not hard to imagine. The question of existence of Markov controls may be possible to explore using recent results on forward-backward stochastic differential equations. For instance [35, 34] present conditions under which the adjoint processes can be expressed as functions of the time and state variables, which could allow writing the optimal control as a function of time and state variables only.

Finally, one of the key assumptions in the risk aware minimum principle is the Fréchet differentiability of the risk function $\rho$ . For our results to hold, it is necessary that the risk function be differentiable over the random variables representing the total cost. Establishing more precisely what risk functions are Fréchet differentiable over a sufficiently large subset of random variables would widen the applicability of the results given in this paper.

Acknowledgments

The authors acknowledge the funding provided by the Singapore Ministry of Education, Tier II grant MOE2015-T2-2-148, Practical algorithms for large-scale sequential optimization.

Appendix A Proofs of the results

A.1 Proofs for Section 2

Proof of Example 2.4.

The main inequality of the example follows from a straight-forward application of the definition of the perturbed control and the Burkholder-Davis-Gundy inequality, [58, Theorem 1.76]. Explicitly,

[TABLE]

∎

Proof of Proposition 2.7.

The drift and diffusion functions $b$ and $\sigma$ are by Assumption 2.5(iii) $L$ -Lipschitz. In addition, by using the growth conditions of Assumption 2.5(ii), they satisfy the following boundedness conditions:

[TABLE]

where we have used the $\bar{p}_{3}$ -admissibility of the control, that is, Assumption 2.5(vii) and Eq. (2.4), and the $\bar{p}_{2}$ upper bound of Eq. (2.11b), $\bar{p}\bar{p}_{2}\leq\bar{p}_{3}$ . So being, [58, Theorem 3.17] states that a unique strong solution $x=(x_{t})_{t\in\mathbb{T}}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}(\Omega;\mathbb{X})$ of Eq. (2.1) exists.

We can now estimate the costs using the growth conditions of Assumption 2.5(iv)

[TABLE]

To reach the final inequalities, we have used Eq. (2.11d) giving $pp_{1}<\bar{p}-p<\bar{p}$ (clearly the weaker assumption $p_{1}<\bar{p}/p$ would have sufficed, but the stronger form shall be used later), and additionally using Eq. (2.11a), $pp_{2}<\bar{p}_{3}$ , so that the finiteness of the terms is implied by $x\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}(\Omega;\mathbb{X})$ and the $\bar{p}_{3}$ -admissibility of the control. We see that $C^{\pi}\in\mathcal{L}^{p}(\Omega;\mathbb{R})$ , and the proof of the first part is complete. ∎

A.2 Proofs for Section 3

Proof of Proposition 3.7.

This result is proven in [19, Proposition 5.25] for the case of $p=2$ ; here we are merely pointing out that the statement naturally holds also in the “smaller” spaces $\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})$ , $p\in(2,\infty]$ . Let $\psi$ be an $\mathcal{L}^{p}(\Omega)$ -representation of $\phi$ , whose Fréchet derivative is continuous. Since the embedding of $\mathcal{L}^{p}(\Omega;\mathbb{R}^{n})$ into $\mathcal{L}^{2}(\Omega;\mathbb{R}^{n})$ is continuous, the Fréchet derivative is continuous on $\mathcal{L}^{2}(\Omega;\mathbb{R}^{n})$ as well. Therefore, there is an almost surely unique $\mathcal{L}$ -derivative $f$ such that $Y=(\Omega\ni\omega\to f(X(\omega)))\in\mathcal{L}^{2}(\Omega;\mathbb{R}^{n})$ . As an element of $\mathcal{L}^{2}(\Omega;\mathbb{R}^{n})$ , $Y$ is also in $\mathcal{L}^{q}(\Omega;\mathbb{R}^{n})$ , $q=p/(p-1)$ . ∎

A.3 Proofs for Section 4

For the detailed proofs, we need to extend our notations somewhat. Let $n,m\in\mathbb{N}$ and $k_{i}\in\mathbb{N}$ for all $i\in\{1,\ldots,m\}$ . For all differentiable functions $f:\mathbb{R}^{n}\to\mathbb{R}^{k_{1}\times\cdots\times k_{m}}$ , we define $\nabla f:\mathbb{R}^{n}\to\mathbb{R}^{k_{1}\times\cdots\times k_{m}\times n}$ so that $(\nabla f(x))_{i_{1},\ldots,i_{m},j}\coloneqq\partial f_{i_{1},\ldots,i_{m}}(x)/\partial x_{j}$ for all $x\in\mathbb{R}^{n}$ , $i_{\ell}\in\{1,\ldots,k_{\ell}\}$ , $\ell\in\{1,\ldots,m\}$ .

Let $N,M\in\mathbb{N}$ , and $n_{i},m_{j}\in\mathbb{N}$ for all $i\in\{1,\ldots,N\}$ , $j\in\{1,\ldots,M\}$ . Let $U\in\mathbb{R}^{n_{1}\times\cdots\times n_{N}}$ and $V\in\mathbb{R}^{m_{1}\times\cdots\times m_{M}}$ . The arrays $UV\in\mathbb{R}^{n_{1}\times\cdots\times n_{N-1}\times m_{2}\times\cdots\times m_{M}}$ and $U\mathop{\cdot\cdot}V\in\mathbb{R}^{n_{1}\times\cdots\times n_{N-2}\times m_{3}\times\cdots\times m_{M}}$ are defined so that

[TABLE]

for all $i_{\ell}\in\{1,\ldots,n_{\ell}\}$ , $\ell\in\{1,\ldots,N-1\}$ and $j_{\ell}\in\{1,\ldots,m_{\ell}\}$ , $\ell\in\{2,\ldots,M\}$ , where in the former definition $n_{N}=m_{1}$ and in the latter $n_{N}=m_{1}$ and $n_{N-1}=m_{2}$ . In addition, for all $X\in\mathbb{R}^{n_{N-1}}$ , we define $U\cdot X\in\mathbb{R}^{n_{1}\times\cdots\times n_{N-2}\times n_{N}}$ as such that

[TABLE]

for all $i_{\ell}\in\{1,\ldots,n_{\ell}\}$ , $\ell\in\{1,\ldots,N-2,N\}$ .

We will also repeatedly use the following identity and estimates:

[TABLE]

In addition, we shall frequently apply the Burkholder-Davis-Gundy (BDG) inequality (see e.g. [58, Theorem 1.76]), and we will use $C_{r}$ , $r\in[1,\infty)$ , to denote the constant in the upper bound given in the inequality.

Proof of Proposition 4.8.

Let $\tilde{\Omega}^{0}\coloneqq\Omega\times\Omega^{\prime}$ , $\tilde{\Sigma}^{0}\coloneqq\Sigma\times\Sigma^{\prime}$ , $\tilde{\mathcal{F}}^{0}\coloneqq\mathcal{F}\times\mathcal{F}^{\prime}$ , and $\tilde{\mathbb{P}}^{0}\coloneqq\mathbb{P}\times\mathbb{P}^{\prime}$ . The filtered probability space $(\tilde{\Omega},\tilde{\Sigma},\tilde{\mathcal{F}},\tilde{\mathbb{P}})$ is then constructed from $(\tilde{\Omega}^{0},\tilde{\Sigma}^{0},\tilde{\mathcal{F}}^{0},\tilde{\mathbb{P}}^{0})$ by conditioning on the event that the paths of the Brownian motions $w$ and $w^{\prime}$ are the same: We set $\tilde{\Omega}\coloneqq\{(\omega,\omega^{\prime})\in\tilde{\Omega}^{0}:w(\omega)=w^{\prime}(\omega^{\prime})\}$ , $\tilde{\mathbb{P}}\coloneqq\tilde{\mathbb{P}}^{0}(\cdot\mid\tilde{\Omega})$ , $\tilde{\Sigma}\coloneqq\{\Gamma\cap\tilde{\Omega}:\Gamma\in\tilde{\Sigma}^{0}\}$ , and $\tilde{\mathcal{F}}_{t}\coloneqq\{\Gamma\cap\tilde{\Omega}:\Gamma\in\tilde{\mathcal{F}}_{t}^{0}\}$ for all $t\in\mathbb{T}$ . We can then define the processes $\tilde{\pi}(\omega,\omega^{\prime})\coloneqq\pi(\omega)$ and $\tilde{\pi}^{\prime}(\omega,\omega^{\prime})\coloneqq\pi^{\prime}(\omega^{\prime})$ for all $(\omega,\omega^{\prime})\in\tilde{\Omega}$ . These are $\tilde{\mathcal{F}}$ -progressive by virtue of $\pi$ and $\pi^{\prime}$ being $\mathcal{F}$ - and $\mathcal{F}^{\prime}$ -progressive, respectively. In addition, we can define the $\tilde{\mathcal{F}}$ -Brownian motion $\tilde{w}\coloneqq(\tilde{w}_{t})_{t\in\mathbb{T}}$ as $\tilde{w}_{t}(\omega,\omega^{\prime})\coloneqq w_{t}(\omega)$ for all $(\omega,\omega^{\prime})\in\tilde{\Omega}$ . The drift and diffusion coefficients defined by Eq. (4.8) satisfy Assumptions 2.5 and by 2.7, a solution $(\tilde{x}_{t},\tilde{x}_{t}^{\prime})_{t\in\mathbb{T}}$ exists on $(\tilde{\Omega},\tilde{\Sigma},\tilde{\mathcal{F}},\tilde{\mathbb{P}})$ . The $\Omega$ and $\Omega^{\prime}$ marginals of $\tilde{\mathbb{P}}$ are again $\mathbb{P}$ and $\mathbb{P}^{\prime}$ , and hence the laws of the state space and control processes on the extended space agree with those on the original spaces. ∎

Proof of Lemma 4.9.

We estimate the distance between the original and the perturbed process as follows. Let $T_{0}\in\mathbb{T}$ be for now arbitrary. Using the triangle inequality, the elementary estimate of Eq. (A.1e), and Jensen’s inequality,

[TABLE]

We estimate individually each of the terms on the right of the above inequality. Starting with the diffusion terms, by the BDG inequality and by using the definition of $\pi_{t}(\alpha,q)$ as $\pi_{t}+\alpha(q_{t}-\pi_{t})$ , for all $t\in\mathbb{T}$ along with Assumption 2.5(i), i.e. the growth condition on $\sigma$ , and similar estimates as above,

[TABLE]

We set this inequality aside to be used a moment later, and consider next the penultimate term in Eq. (A.2). Using again the BDG inequality and Assumption 2.5(iii), and the inequality

[TABLE]

for all measurable $f$ , $\gamma\geq 1$ , $t\in\mathbb{T}$ , and any $K>0$ (pre-emptively, we select $K=4^{-\bar{p}+1}L^{-\bar{p}}C_{\bar{p}}^{-1}$ ), we obtain

[TABLE]

The drift terms in Eq. (A.2) are estimated similarly,

[TABLE]

and

[TABLE]

Collecting estimates of Eqs. (A.3, A.4, A.5, A.6) and applying them to Eq. (A.2), and after rearranging, we get,

[TABLE]

The expectation in $S_{2}$ is finite, since by Proposition 2.7 $x^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}(\Omega;\mathbb{X})$ and $\bar{p}\bar{p}_{1}\leq\bar{p}$ (Assumptions 2.5 state $\bar{p}_{1}\in[0,1]$ ), and $\bar{p}\bar{p}_{2}\leq\bar{p}_{3}$ by Eq. (2.11b) and the control is $\bar{p}_{3}$ -admissible. Using Grönwall’s inequality (see e.g. [58, Corollary 6.60]), we find that

[TABLE]

from where Eq. (4.9a) follows.

To prove Eq. (4.9b), consider the running cost processes, $(x_{t}^{\prime\,\pi(\alpha,q)})_{t\in\mathbb{T}}$ and $(x_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ . We have that

[TABLE]

The latter term is $\mathcal{O}(\alpha^{p})$ , which can be verified using similar estimates and assumptions as before,

[TABLE]

The finiteness of the expectation on the second to last line now follows from the fact $x^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}$ and $\bar{p}_{3}$ -admissibility of the control, when one notes that by Eqs. (2.11a, 2.11d), $pp_{i}<\bar{p}-p<\bar{p}\leq\bar{p}_{3}$ , $i\in\{1,2\}$ .

To complete the proof of Eq. (4.9b), it then remains to show that the first term on the right hand side of Eq. (A.9) is also $\mathcal{O}(\alpha^{p})$ . The boundedness of $\nabla c$ , i.e. Assumption 2.5(v) implies that

[TABLE]

and where $\ell_{p_{1}^{\prime}}$ is defined in Eq. (A.1d). Thus,

[TABLE]

The first term on the right is $o(\alpha^{p})$ by virtue of Eq. (4.9a) which we established earlier, and the fact that $p<\bar{p}$ from Eq. (2.11a). The remaining two are treated as follows. We define

[TABLE]

Using Eqs. (2.11a, 2.11c, 2.11d), we have that

[TABLE]

and consequently,

[TABLE]

We can then use Hölder’s inequality

[TABLE]

The latter factor is $\mathcal{O}(\alpha^{p})$ , again by Eq. (4.9a), and since $p\zeta_{i}/(\zeta_{i}-1)<\bar{p}$ , $i\in\{1,2\}$ . We only need to show $\mathbb{E}[Z_{i}^{\zeta_{i}}]<\infty$ , $i\in\{1,2\}$ , which is straight-forward (note that Eqs. (2.11) also imply $\bar{p}/p_{1}^{\prime}>1$ and $\bar{p}/p_{2}^{\prime}>1$ ):

[TABLE]

where we have as usual used the fact $x^{\pi},x^{\pi(\alpha,q)}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}$ and the $\bar{p}_{3}$ -admissibility of the control. Therefore Eq. (4.9b) holds.

Finally, we turn to Eq. (4.10). We have that

[TABLE]

The latter expectation is in $\mathcal{O}(\alpha^{p})$ by Eq. (4.9a), and the former is finite since by Eqs. (2.11c, 2.11d),

[TABLE]

and $x^{\pi},x^{\pi(\alpha,q)}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}$ . Eq. (4.10) holds, and the proof is complete. ∎

Proof of Lemma 4.10.

The drift and diffusion coefficients appearing in Eq. (4.11) are Lipschitz, since by Assumption 2.5(iii) the gradients of $b$ and $\sigma$ are bounded. In addition, the terms in Eq. (4.11) that do not depend on $\delta^{\pi}$ satisfy

[TABLE]

where we have used the growth conditions of Assumption 2.5(ii), Eq. (2.11b), $x^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}$ , and the $\bar{p}_{3}$ -admissibility of the control. A unique strong solution of Eq. (4.11) now exists by e.g. [58, Theorem 3.17], which also satisfies Eq. (4.13a).

To prove Eq. (4.13b), let $r\in(1,\bar{p})$ be to be determined. We use the growth conditions on $c$ and $\nabla_{\mathbb{X}}c$ , Assumptions 2.5(iv, v), to estimate $\delta^{\prime\,\pi,q}$ as follows.

[TABLE]

We select $r\zeta/(\zeta-1)=\bar{p}$ , that is, $\zeta=1/(1-r/\bar{p})>1$ so that the expectation involving $\delta^{\pi,q}$ is guaranteed to be finite (this follows from Eq. (4.13a)). In order to ensure that the two other expectations are also finite, we need to fix $r$ so that

[TABLE]

where the upper bounds are set by $x^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}}$ , and the $\bar{p}_{3}$ -admissibility of the control. Largest choice therefore satisfies

[TABLE]

Note that $\zeta$ here depends on $r$ , but since as a function of $r$ , the right hand side is continuous, decreasing, and maps $(0,\bar{p})$ to $(0,a)$ for some $a>0$ , a solution exists.

We want to show that $r>p$ . Using Eqs. (2.11c) and (2.11d),

[TABLE]

This implies that $(\bar{p}/p-1)r>\bar{p}-r$ , from which we obtain $r>p$ . This establishes Eq. (4.13b).

Let us define

[TABLE]

Note now that we can write

[TABLE]

where we have defined

[TABLE]

Analogously, we find that

[TABLE]

where

[TABLE]

We can then write, for arbitrary $T_{0}\in\mathbb{T}$ ,

[TABLE]

Using arguments we used in the proof of the previous lemma, we can estimate

[TABLE]

Returning to Eq. (A.15), applying the estimates of Eqs. (A.16) and (A.17), and rearranging and using Grönwall’s inequality (see steps leading to Eqs. (A.7, A.8)), we obtain

[TABLE]

To show that Eq. (4.14a) holds, it then remains to show that

[TABLE]

These limits are obtained from the definitions of $(b_{t}^{\pi,\pi(\alpha,q),2})_{t\in\mathbb{T}}$ and $(\sigma_{t}^{\pi,\pi(\alpha,q),2})_{t\in\mathbb{T}}$ by using Eqs. (4.9a) and (4.13a), and the continuity and boundedness of the gradients of $b$ and $\sigma$ . Explicitly,

[TABLE]

The first term is $\mathcal{O}(\alpha^{\bar{p}})$ by Eq. (4.9a), and the second is finite by Eq. (4.13a) and boundedness of $\nabla_{\mathbb{X}}b$ , and tends to zero as $\alpha\to 0$ , since $\nabla_{\mathbb{X}}b$ is continuous. This completes the estimates of the right hand side of Eq. (A.15), and so Eq. (4.14a) is proven.

Moving on to proving Eq. (4.14b), we proceed as above, and use estimates analogous to those used in the lead up to Eq. (A.10), so that

[TABLE]

The terms on the three first lines on the right of the last inequality can now be shown to be $o(\alpha^{p})$ using Eq. (4.14a) and an application of Hölder’s inequality mimicking the steps in Eqs. (A.11, A.12, A.13). Similarly applying Hölder’s inequality to the remaining term on the last line yields

[TABLE]

and, by continuity of $\nabla_{\mathbb{X}}c$ , this term is in $o(\alpha^{0})$ , and Eq. (4.14b) follows. ∎

Proof of Lemma 4.11.

Using the $\mathcal{L}$ -differentiability of $\rho$ and Eq. (3.2) and the short-hand of Eq. (4.17), we have that

[TABLE]

where

[TABLE]

By Lemma 4.9,

[TABLE]

and therefore $R^{\pi}\in o(\alpha)$ . From the optimality condition of Eq. (4.18),

[TABLE]

Dividing this by $\alpha$ , taking the limit $\alpha\to 0$ while using a similar estimate as in Eq. (A.14) and applying Lemma 4.10, we obtain the equation claimed in the statement of this lemma. ∎

Proof of Lemma 4.12.

We give the proof for $p>1$ , but comment at relevant places on changes needed to accommodate the $p=1$ case. The statement of the lemma amounts to expressing Eq. (4.19) by using processes that are constructed to satisfy Eq. (4.20). For brevity, we set $B_{t}^{\pi}\coloneqq\nabla_{\mathbb{X}}b(t,x_{t}^{\pi},\pi_{t})\in\mathbb{R}^{d_{x}\times d_{x}}$ , $F_{t}^{\pi}\coloneqq\nabla_{\mathbb{X}}c(t,x_{t}^{\pi},\pi_{t})\in\mathbb{R}^{1\times d_{x}}$ , $S_{t}^{\pi}\coloneqq\nabla_{\mathbb{X}}\sigma(t,x_{t}^{\pi},\pi_{t})\in\mathbb{R}^{d_{x}\times d_{w}\times d_{x}}$ for all $t\in\mathbb{T}$ .

Let $(U_{t}^{\pi})_{t\in\mathbb{T}}$ be the fundamental solution of Eq. (4.11), i.e. $U_{t}^{\pi}\in\mathbb{R}^{d_{x}\times d_{x}}$ , $t\in\mathbb{T}$ , $U_{0}^{\pi}=I$ , where $I$ is the identity matrix, and

[TABLE]

The drift and diffusion functions, $(t,U)\to(B_{t}^{\pi}U)_{t\in\mathbb{T}}$ and $(t,U)\to(S_{t}^{\pi}U)_{t\in\mathbb{T}}$ , $(t,U)\in\mathbb{T}\times\mathbb{R}^{d_{x}\times d_{s}}$ , are $L$ -Lipschitz in $U$ for all $t\in\mathbb{T}$ since by Assumption 2.5(iii) the gradients of $b$ and $\sigma$ are bounded. By an application of e.g. [58, Theorem 3.17], we find that Eq. (A.18) has a unique strong solution satisfying

[TABLE]

In addition, we define the process $(V_{t}^{\pi})_{t\in\mathbb{T}}$ as the unique strong solution of the stochastic differential equation

[TABLE]

Such a solution exists by the same argument as used for Eq. (A.18), and moreover,

[TABLE]

holds. It is easily verified that $V_{t}^{\pi}=(U_{t}^{\pi})^{-1}$ for all $t\in\mathbb{T}$ by applying Itô’s lemma to $t\to V_{t}^{\pi}U_{t}^{\pi}$ and $t\to U_{t}^{\pi}V_{t}^{\pi}$ . Finally, one additional, $\mathbb{R}^{1\times d_{x}}$ -valued process will be necessary. We define $(Q_{t}^{\pi})_{t\in\mathbb{T}}$ as such that

[TABLE]

Let us then define the random variable $\Xi^{\pi}$ as such that

[TABLE]

We begin by showing that $\Xi\in\mathcal{L}^{r}(\Omega;\mathbb{R}^{1\times d_{x}})$ for a $r\in[1,\infty)$ to be determined later. For this task, we additionally define

[TABLE]

Using Eq. (2.11), we find that

[TABLE]

A straight-forward calculation yields additionally the following

[TABLE]

that is, $\hat{p}\in(p,\bar{p})$ . Let $\tilde{p}\in(\hat{p},\bar{p})$ be arbitrary, and set

[TABLE]

We claim that

[TABLE]

Since $\tilde{p}<\bar{p}$ , we have $\tilde{p}/(\tilde{p}-1)>\bar{p}/(\bar{p}-1)$ and the above implies that $\Xi^{\pi}\in\mathcal{L}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{R}^{1\times d_{x}})$ . To prove Eq. (A.24), we first show that

[TABLE]

We explicitly prove only the latter inclusion, the former is established in very much the same way. Basic estimates show that

[TABLE]

Using Hölder’s inequality,

[TABLE]

The first factor is finite by Eq. (A.19). To show the same for the second one, it suffices to show that we can select a $\zeta\in(1,\infty)$ so that

[TABLE]

By the standard estimate of Eq. (2.12), and the admissibility condition of Eq. (2.4) and constraints of Eq. (2.11), the largest $\zeta$ we can take is $p^{\ast}/\tilde{q}$ . Th choice $\zeta=p^{\ast}/\tilde{q}$ is valid if $\zeta\in(1,\infty)$ , which is indeed the case:

[TABLE]

This is sufficient to establish Eq. (A.27).

Turning to proving Eq. (A.24), based on the above it is enough to show that $D^{\pi}G\in\mathcal{L}^{\tilde{p}/(\tilde{p}-1)}(\Omega;\mathbb{R}^{1\times d_{x}})$ for all $G\in\mathcal{L}^{\tilde{q}}(\Omega;\mathbb{R}^{1\times d_{x}})$ . Let $G$ be any such random variable. Then, again using Hölder’s inequality,

[TABLE]

Selecting $\xi=[p/(p-1)]/[\tilde{p}/(\tilde{p}-1)]$ , the first factor in the above inequality is finite, since $D^{\pi}\in\mathcal{L}^{p/(p-1)}(\Omega;\mathbb{R})$ . Note that

[TABLE]

since $\tilde{p}>p$ . Moreover, with this choice, the power on the second factor becomes $\tilde{q}$ ,

[TABLE]

and since $G\in\mathcal{L}^{\tilde{q}}(\Omega;\mathbb{R}^{1\times d_{x}})$ , $D^{\pi}G\in\mathcal{L}^{\tilde{p}/(\tilde{p}-1)}(\Omega;\mathbb{R})$ . Eq. (A.24) is now proven.

We can now apply the martingale representation theorem for $\mathcal{L}^{r}$ -random variables ( $r>1$ ), given e.g. in [58, Theorem 2.42], to $\Xi^{\pi}$ and $D^{\pi}$ . This provides us with unique $\mathcal{F}$ -predictable processes $\Sigma^{\pi}=(\Sigma_{t}^{\pi})_{t\in\mathbb{T}}$ and $z^{\prime\,\pi}=(z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ , taking respectively values in $\mathbb{R}^{d_{w}\times d_{x}}$ and $\mathbb{R}^{d_{w}}$ , such that

[TABLE]

These representations are unique, and

[TABLE]

Moreover, we define the processes $\Lambda^{\pi}=(\Lambda_{t}^{\pi})_{t\in\mathbb{T}}$ and $y^{\prime\,\pi}=(y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ as the $\mathcal{F}_{t}$ -conditional expectations of $\Xi^{\pi}$ and $D^{\pi}$ , which now by [58, Corollary 2.44] satisfy

[TABLE]

We note that if $p=1$ , in the application of the martingale representation theorem we may instead pick an arbitrary $q\in(1,\infty)$ instead of $p/(p-1)$ .

We next define the processes $y^{\pi}=(y_{t}^{\pi})_{t\in\mathbb{T}}$ , and $z^{\pi}=(z_{t}^{\pi})_{t\in\mathbb{T}}$ as such that

[TABLE]

The process $(y_{t}^{\pi},y_{t}^{\prime\,\pi},z_{t}^{\pi},z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ solves Eq. (4.20). This is already shown above for $(y_{t}^{\prime\,\pi},z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ . To show that $(y_{t}^{\pi},z_{t}^{\pi})_{t\in\mathbb{T}}$ satisfies its respective backward stochastic differential equation, we apply Itô’s lemma to $y^{\pi}$ as given in Eq. (A.35) to obtain

[TABLE]

To verify the terminal condition $y_{T}^{\pi}=D^{\pi}\nabla_{\mathbb{X}}g(x_{T}^{\pi})$ in Eq. (4.20), note that from the definitions of $y_{t}^{\pi}$ and $\Lambda_{t}^{\pi}$ ,

[TABLE]

We can now establish that $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ and $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ , or in fact, a slightly strengthened version thereof. To proceed, we unfortunately need to add to the notational clutter: Let $\mathring{p}\in(\tilde{p},\bar{p})$ , and set $\mathring{q}\coloneqq 1/(1/p-1/\mathring{p})$ . Consider first the process $y^{\pi}$ , and let us estimate the terms in its definition, Eq. (A.35) individually. For the first term we obtain

[TABLE]

Selecting $\zeta=[\tilde{p}/(\tilde{p}-1)]/[\mathring{p}/(\mathring{p}-1)]$ , and using Eqs. (A.20) and (A.34), we find that the above is finite. Note that $\zeta>1$ since $\tilde{p}<\mathring{p}$ . The second term is treated as follows.

[TABLE]

where $r_{1},r_{2},r_{3}\in(1,\infty)$ and $r_{1}^{-1}+r_{2}^{-1}+r_{3}^{-1}=1$ . We select $r_{1}=[p/(p-1)]/[\mathring{p}/(\mathring{p}-1)]$ , as this is by Eq. (A.34) the largest choice still ensuring the finiteness of the first factor. Next, we set $r_{2}=p^{\ast}/[\mathring{p}/(\mathring{p}-1)]$ as this is sufficient to guarantee the finiteness of the second factor, cf. Eq. (A.29). By Eqs. (A.19) and (A.20), $r_{3}$ may be arbitrarily large, and we then only need to verify that $r_{1}^{-1}+r_{2}^{-1}<1$ :

[TABLE]

where the final inequality is a simple consequence of $\hat{p}<\mathring{p}$ , or in this case, $-1/\hat{p}<-1/\mathring{p}$ . We now have that

[TABLE]

Regarding the process $z^{\pi}$ , we estimate each of terms in its definition, Eq. (A.36) individually. For the first and second terms, an almost identical calculation as above for the first and second terms of $y^{\pi}$ , but using Eq. (A.33) instead of Eq. (A.34), shows

[TABLE]

The last term in $z^{\pi}$ , $y_{t}^{\pi}S_{t}^{\pi}$ , can be treated in the same way, noting the boundedness of $S_{t}^{\pi}$ for all $t\in\mathbb{T}$ . Putting the above together, we have that

[TABLE]

and since $\bar{p}>\mathring{p}$ , or $\bar{p}/(\bar{p}-1)<\mathring{p}/(\mathring{p}-1)$ . Therefore, $y^{\pi}\in\mathcal{S}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Y})$ and $z^{\pi}\in\mathcal{H}_{\mathcal{F}}^{\bar{p}/(\bar{p}-1)}(\Omega;\mathbb{Z})$ .

Finally, we prove Eq. (4.21). The solution of Eq. (4.11) can be written using the processes $U^{\pi}$ and $V^{\pi}$ as

[TABLE]

Consider next the processes $\gamma^{\pi,q}=(\gamma_{t}^{\pi,q})_{t\in\mathbb{T}}$ and $\gamma^{\prime\,\pi,q}=(\gamma_{t}^{\prime\,\pi,q})_{t\in\mathbb{T}}$ taking respectively values in $\mathbb{R}^{d_{x}}$ and $\mathbb{R}$ , and defined as

[TABLE]

where $\delta^{\prime\,\pi,q}$ is as defined in Eq. (4.12). From Eq. (A.37) we immediately find that $\gamma^{\pi,q}$ and $\gamma^{\prime\,\pi,q}$ satisfy

[TABLE]

Note now that the expectation of $\Lambda_{T}^{\pi}\gamma_{T}^{\pi,q}+y_{T}^{\prime\,\pi}\gamma_{T}^{\prime\,\pi,q}$ equals the right-hand side of the inequality of Eq. (4.19):

[TABLE]

To compute the above, we differentiate $\Lambda_{t}^{\pi}\gamma_{t}^{\pi}+y_{t}^{\prime\,\pi}\gamma_{t}^{\prime\,\pi,q}$ to obtain555The following identities were used in simplifying the expressions in this chain of equations: (i) $(\Sigma_{t}^{\pi}\cdot\mathrm{d}w_{t})V_{t}^{\pi}=(\Sigma_{t}^{\pi}V_{t}^{\pi})\cdot\mathrm{d}w_{t}$ ; (ii) $(\Sigma_{t}^{\pi}\cdot\mathrm{d}w_{t})V_{t}^{\pi}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})\,\mathrm{d}w_{t}=\mathop{\mathrm{Tr}}[\Sigma_{t}^{\pi}V_{t}^{\pi}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})]\,\mathrm{d}t$ ; (iii) $(z_{t}^{\prime\,\pi}\cdot\mathrm{d}w_{t})Q_{t}^{\pi}V_{t}^{\pi}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})\,\mathrm{d}w_{t}=\mathop{\mathrm{Tr}}[z_{t}^{\prime\,\pi}Q_{t}^{\pi}V_{t}^{\pi}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})]\,\mathrm{d}t$ ; (iv) $y_{t}^{\pi}S_{t}^{\pi}\mathop{\cdot\cdot}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})=\mathop{\mathrm{Tr}}[y_{t}^{\pi}S_{t}^{\pi}\sigma(t,x_{t}^{\pi},q_{t}-\pi_{t})]$ ; (v) $(z_{t}^{\prime\,\pi}\cdot\mathrm{d}w_{t})Q_{t}^{\pi}V_{t}^{\pi}=(z_{t}^{\prime\,\pi}Q_{t}^{\pi}V_{t}^{\pi})\cdot\mathrm{d}w_{t}$ .,

[TABLE]

Evaluating this at $t=T$ , taking the expectation, and using Eq. (A.38), we get

[TABLE]

where

[TABLE]

Eq. (4.21) follows now immediately from Eqs. (4.19) and (A.39), if $M^{\pi,q}=0$ . This is the case if the integrands in the expression are in $\mathcal{H}_{\mathcal{F}}^{1}(\Omega;\mathbb{R}^{1\times d_{w}})$ , see e.g. [58, Theorem 2.6]. Using the estimates of Eqs. (4.13) for $\delta^{\pi},\delta^{\prime\,\pi}$ , and the bounds on $S_{t}^{\pi}$ and $\sigma$ , the square integrability can be verified with a straight-forward application of Hölder’s inequality. For example, for the first term,

[TABLE]

and the rest follow analogously. If $p=1$ , the last term is somewhat special, since we cannot apply Hölder’s inequality in the same way (we have no $\mathcal{L}^{\infty}$ -norm equivalent bound on $z^{\prime\,\pi}$ ). However, the estimate of Eq. (4.13b) is slightly stronger than that of Eq. (4.13a) precisely to accommodate this edge case. Therefore, $M^{\pi,q}=0$ , and the proof is complete. ∎

Proof of Lemma 4.13.

let $q\in\mathfrak{V}^{p}(b,\sigma,\nu)$ be arbitrary. We have from the $\mathcal{L}$ -convexity of $\rho$ and convexity of $g$ that

[TABLE]

By using the construction of the process $(y_{t}^{\pi},y_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ , the above becomes

[TABLE]

We can evaluate the above expectation by first differentiating $y_{t}^{\pi}(x_{t}^{\pi}-x_{t}^{q})+y_{t}^{\prime\,\pi}(x_{t}^{\prime\,\pi}-x_{t}^{\prime\,q})$ , using the $y$ and $x$ differential equations (2.1, 4.7, 4.20), evaluating the integrals at $T$ , and then taking expectations:

[TABLE]

so that

[TABLE]

The expectation of the integrals against the Brownian motion is zero [58, Theorem 2.6], since the integrands are in $\mathcal{H}_{\mathcal{F}}^{1}(\Omega;\mathbb{R})$ .

Let us denote

[TABLE]

Since we have assumed that $H$ is jointly convex in the state and control variables, we have that for any $\alpha\in(0,1]$ ,

[TABLE]

On the other hand, using the differentiability of $H$ on $\mathbb{X}$ ,

[TABLE]

so that

[TABLE]

Using the above, along with the assumption that $\pi_{t}$ minimizes $\eta\to h_{t}(x_{t}^{\pi},\eta)$ for $\mathbb{P}\times\mathrm{d}t$ -almost every $(\omega,t)\in\Omega\times\mathbb{T}$ , we have that

[TABLE]

$\mathbb{P}\times\mathrm{d}t$ -almost always. Applying this estimate in Eq. (A.40), we get

[TABLE]

implying that

[TABLE]

and the proof is complete. ∎

A.4 Proofs for Section 5

Proof of Lemma 5.2.

(i) Note that $X=\mathbb{E}[X]$ if and only if $X$ is almost surely constant. The Fréchet derivative of the first term in Eq. (5.1), the expectation, is clearly $\mathrm{D}\mathbb{E}[X]=1$ . Focusing then on the derivative of the second, norm term, we first note that the derivative of $\mathcal{L}^{2}(\Omega;\mathbb{R})\ni X\to\|X\|_{2}\in\mathbb{R}_{\geq 0}$ is $\|X\|_{2}^{-1}X\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ , which suggests that

[TABLE]

This can be verified through a direct calculation: By straight-forward algebraic manipulation, one obtains

[TABLE]

Taking the limit $\|Y\|_{2}\to 0$ , the following is quickly recovered

[TABLE]

The right-hand side is clearly zero. Noting that $\langle X-\mathbb{E}[X],1\rangle=0$ , Eq. (5.3) follows. The non-differentiability at almost surely constant random variables follows from the positive homogeneity and translation invariance of $\rho$ : At $X=x$ , $x\in\mathbb{R}$ , we have $\rho^{\mathrm{MD}}(X)=\rho^{\mathrm{MD}}(0)+x$ , but at $X=0$ , we have that $\lim_{\epsilon\to 0}\epsilon^{-1}[\rho^{\mathrm{MD}}(0+\epsilon Y)-\rho^{\mathrm{MD}}(0)]=\lim_{\epsilon\to 0}\epsilon^{-1}\rho^{\mathrm{MD}}(\epsilon Y)=\rho^{\mathrm{MD}}(Y)$ which is not linear. Eq. (5.4) is easily found from the form of the Fréchet derivative of Eq. (5.3).

(ii) We first note that

[TABLE]

Since $U_{\epsilon}^{\prime}(x)=\epsilon^{-1}\mathrm{e}^{-x/\epsilon}/(1+\mathrm{e}^{-x/\epsilon})^{2}\in(0,1/(4\epsilon)]\,\forall x\in\mathbb{R}$ , from the second equality it follows that

[TABLE]

We verify Eq. (5.5) by a direct calculation. Consider now

[TABLE]

Next, using the identity of Eq. (A.41), followed by Hölder’s inequality, and the estimate of Eq. (A.42), we get that

[TABLE]

This is sufficient to show that $\rho_{\epsilon}^{\mathrm{MD+}}(X+H)-\rho_{\epsilon}^{\mathrm{MD+}}(X)-\mathbb{E}[\mathrm{D}\rho_{\epsilon}^{\mathrm{MD+}}(X)H]\in o(\|H\|_{2})$ for all $X,H\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ , and so $\rho_{\epsilon}^{\mathrm{MD+}}$ is Fréchet differentiable on $\mathcal{L}^{2}(\Omega;\mathbb{R})$ with the given derivative $\mathrm{D}\rho_{\epsilon}^{\mathrm{MD+}}(X)$ for all $X\in\mathcal{L}^{2}(\Omega;\mathbb{R})$ . The form of the $\mathcal{L}$ -derivative is easily verified from $\mathrm{D}\rho_{\epsilon}^{\mathrm{MD+}}(X)$ .

(iii) It suffices to show that the limit in Eq. (3.1) is attained uniformly over $Y\in\mathcal{L}^{\infty}(\Omega;\mathbb{R})$ such that $\|Y\|_{\infty}=1$ . We now have that $\mathbb{E}[\mathrm{e}^{\theta(X+\epsilon Y)}]-\mathbb{E}[\mathrm{e}^{\theta X}]=\epsilon\mathbb{E}[\mathrm{e}^{\theta X}\theta Y]+o(\epsilon)$ by Taylor series expanding the exponential and using the fact $Y(\omega)\leq\|Y\|_{\infty}$ almost everywhere. By the chain rule of differentiation, $\mathrm{D}\rho^{\mathrm{Ent}}(X)=\mathrm{e}^{\theta X}/\mathbb{E}[\mathrm{e}^{\theta X}]$ follows. The $\mathcal{L}$ -derivative is similarly easily found. ∎

Proof of Proposition 5.5.

We first note that Assumptions 2.5 are easily verified for the stochastic differential equations of Problem $\mathscr{P}_{\phi}$ . Suppose $\pi\in\mathfrak{V}^{p}(b_{\phi},\sigma_{\phi},\nu_{\phi})$ is $\mathscr{P}_{\phi}$ -optimal. The Hamiltonian of Eq. (4.1) becomes

[TABLE]

By Theorem 4.2, we know that there exists a process $(y_{t}^{\pi},y_{t}^{\prime\,\pi},z_{t}^{\pi},z_{t}^{\prime\,\pi})_{t\in\mathbb{T}}$ where by Eqs. (4.2, 4.3, 4.6), $y_{t}^{\prime\,\pi}=\mathbb{E}\bigl{[}\mathrm{D}\rho(\mathscr{L}(-x_{T}^{\pi}))(-x_{T}^{\pi})\bigm{|}\mathcal{F}_{t}\bigr{]}$ , $\mathrm{d}y_{t}^{\prime\,\pi}=z_{t}^{\prime\,\pi}\,\mathrm{d}w_{t}$ , and $\mathrm{d}y_{t}^{\pi}=z_{t}^{\pi}\,\mathrm{d}w_{t}$ , $y_{T}=-y_{T}^{\prime}=-\mathrm{D}\rho(\mathscr{L}(-x_{T}^{\pi}))(-x_{T}^{\pi})$ . This yields Eqs. (5.9) and (5.10). By the uniqueness of solutions of Lipschitz backward differential equations, see e.g. [58, Theorem 5.17], we have that $y_{t}^{\pi}=-y_{t}^{\prime\,\pi}$ and $z_{t}^{\pi}=-z_{t}^{\prime\,\pi}$ for all $t\in\mathbb{T}$ , $\mathbb{P}$ -almost surely. From the assumption of positivity of $\mathrm{D}\rho(\cdot)(\cdot)$ we infer that $y_{t}^{\prime\,\pi}>0$ and $y_{t}^{\pi}<0$ for all $t\in\mathbb{T}$ . We can thus assume in the following that $\mathbb{Y}=\mathbb{R}_{<0}$ and $\mathbb{Y}^{\prime}=\mathbb{R}_{>0}$ .

Since $H$ as a function of the control, $\phi\to H(y,z,\phi)$ is strictly convex for all $(y,z)\in\mathbb{Y}\times\mathbb{Z}$ , by an elementary application of Jensen’s inequality we see that a minimizer of Eq. (4.4) is found in Dirac measures: For any $\pi\in\mathcal{P}(\mathbb{A})$ , we have that

[TABLE]

for all $(y,z)\in\mathbb{Y}\times\mathbb{Z}$ , where $\bar{\phi}=\int_{\mathbb{A}}\phi\,\pi(\mathrm{d}\phi)$ . If $\pi^{\ast}\in\mathcal{P}(\mathbb{A})$ is a minimizer of $\pi\to H(y,z,\pi)=\int_{\mathbb{A}}H(y,z,\phi)\,\pi(\mathrm{d}\phi)$ and $\bar{\phi}^{\ast}=\int_{\mathbb{A}}\phi\,\pi^{\ast}(\mathrm{d}\phi)$ , then using the convexity of $\mathbb{A}$ , $\bar{\phi}^{\ast}\in\mathbb{A}$ we have the Dirac measure $\delta_{\bar{\phi}^{\ast}}$ such that $H(y,z,\delta_{\bar{\phi}^{\ast}})\leq H(y,z,\pi^{\ast})$ . Therefore a minimizer is always found within the set of Dirac measures, and we can consider the problem

[TABLE]

This immediately yields as a minimizer the function $\phi^{\ast}:$$\mathbb{Y}\times\mathbb{Z}\to\mathbb{A}$ such that

[TABLE]

Since $H$ and the terminal cost function are convex in $(x,\phi)$ and $x$ , respectively, and $\rho$ is $\mathcal{L}$ -convex by the assumption of the proposition, Assumption 4.1 holds and by Theorem 4.2(ii) the above properties are also sufficient for $\phi_{t}=\phi^{\ast}(y_{t}^{\pi},z_{t}^{\pi})$ to be $\mathscr{P}_{\phi}$ -optimal. ∎

Bibliography75

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Acciaio and I. Penner , Dynamic risk measures , in Advanced Mathematical Methods for Finance, G. Di Nunno and B. Øksendal, eds., Springer, Berlin Heidelberg, 2011, pp. 1–34.
2[2] B. Acciaio and G. Svindland , Are law-invariant risk functions concave on distributions? , Dependence Modeling, 1 (2013).
3[3] N. U. Ahmed and C. D. Charalambous , Stochastic minimum principle for partially observed systems subject to continuous and jump diffusion processes and driven by relaxed controls , SIAM Journal on Control and Optimization, 51 (2013), pp. 3235–3257.
4[4] D. Andersson and B. Djehiche , A maximum principle for relaxed stochastic control of linear SD Es with application to bond portfolio optimization , Mathematical Methods of Operations Research, 72 (2010), pp. 273–310.
5[5] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath , Coherent measures of risk , Mathematical Finance, 9 (1999), pp. 203–228.
6[6] F. Baghery and B. Øksendal , A maximum principle for stochastic control with partial information , Stochastic Analysis and Applications, 25 (2007), pp. 705–717.
7[7] K. Bahlali, M. Mezerdi, and B. Mezerdi , On the relaxed mean-field stochastic control problem , Stochastics and Dynamics, 18 (2018), p. 1850024.
8[8] S. Bahlali , Necessary and sufficient optimality conditions for relaxed and strict control problems , SIAM Journal on Control and Optimization, 47 (2008), pp. 2078–18.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Risk aware minimum principle for optimal control of stochastic differential

Abstract

1 Introduction

Literature review

Contributions and organization of the paper

2 Model

Definition 2.1**.**

Definition 2.2**.**

Example 2.3**.**

Example 2.4**.**

Assumption 2.5**.**

Definition 2.6**.**

Proposition 2.7**.**

3 Risk functions

Risk aware objective function

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3**.**

Differentiability of risk functions

Definition 3.4**.**

Remark 3.5*.*

Definition 3.6**.**

Proposition 3.7**.**

Remark 3.8*.*

Definition 3.9**.**

Assumption 3.10**.**

Remark 3.11*.*

4 Risk aware minimum principle

Main results

Assumption 4.1**.**

Theorem 4.2** (Risk aware minimum principle).**

Remark 4.3*.*

Remark 4.4*.*

Corollary 4.5**.**

Proof.

Remark 4.6*.*

Remark 4.7*.*

Proofs of the main results

Lemma 4.8**.**

Lemma 4.9**.**

Lemma 4.10**.**

Lemma 4.11**.**

Lemma 4.12**.**

Lemma 4.13**.**

Proof of Theorem 4.2.

5 Examples of differentiable risk functions and

Risk functions

Definition 5.1**.**

Lemma 5.2**.**

Remark 5.3*.*

Portfolio allocation problem

Proposition 5.4**.**

Proof.

Proposition 5.5**.**

6 Conclusions

Acknowledgments

Appendix A Proofs of the results

A.1 Proofs for Section 2

Proof of Example 2.4.

Proof of Proposition 2.7.

A.2 Proofs for Section 3

Proof of Proposition 3.7.

A.3 Proofs for Section 4

Proof of Proposition 4.8.

Proof of Lemma 4.9.

Proof of Lemma 4.10.

Proof of Lemma 4.11.

Proof of Lemma 4.12.

Proof of Lemma 4.13.

A.4 Proofs for Section 5

Proof of Lemma 5.2.

Proof of Proposition 5.5.

Definition 2.1.

Definition 2.2.

Example 2.3.

Example 2.4.

Assumption 2.5.

Definition 2.6.

Proposition 2.7.

Definition 3.1.

Definition 3.2.

Definition 3.3.

Definition 3.4.

*Remark 3.5**.*

Definition 3.6.

Proposition 3.7.

*Remark 3.8**.*

Definition 3.9.

Assumption 3.10.

*Remark 3.11**.*

Assumption 4.1.

Theorem 4.2 (Risk aware minimum principle).

*Remark 4.3**.*

*Remark 4.4**.*

Corollary 4.5.

*Remark 4.6**.*

*Remark 4.7**.*

Lemma 4.8.

Lemma 4.9.

Lemma 4.10.

Lemma 4.11.

Lemma 4.12.

Lemma 4.13.

Definition 5.1.

Lemma 5.2.

*Remark 5.3**.*

Proposition 5.4.

Proposition 5.5.