Robust utility maximization under model uncertainty via a penalization   approach

Ivan Guo; Nicolas Langren\'e; Gr\'egoire Loeper; Wei Ning

arXiv:1907.13345·math.OC·March 8, 2022

Robust utility maximization under model uncertainty via a penalization approach

Ivan Guo, Nicolas Langren\'e, Gr\'egoire Loeper, Wei Ning

PDF

TL;DR

This paper develops a robust utility maximization framework under model uncertainty using penalization, interpreting it as a stochastic differential game, and demonstrates its effectiveness with real market data.

Contribution

It introduces a penalization-based robust optimization approach, linking it to a stochastic differential game and providing analytical and numerical solutions.

Findings

01

Robust portfolios yield higher expected utility.

02

Portfolios are more stable during market downturns.

03

The approach is validated with real market data.

Abstract

This paper addresses the problem of utility maximization under uncertain parameters. In contrast with the classical approach, where the parameters of the model evolve freely within a given range, we constrain them via a penalty function. We show that this robust optimization process can be interpreted as a two-player zero-sum stochastic differential game. We prove that the value function satisfies the Dynamic Programming Principle and that it is the unique viscosity solution of an associated Hamilton-Jacobi-Bellman-Isaacs equation. We test this robust algorithm on real market data. The results show that robust portfolios generally have higher expected utilities and are more stable under strong market downturns. To solve for the value function, we derive an analytical solution in the logarithmic utility case and obtain accurate numerical approximations in the general case by three…

Equations231

\frac{d S _{t}^{i}}{S _{t}^{i}} = μ_{t}^{i} d t + j = 1 \sum d σ_{t}^{ij} d W_{t}^{j}, 1 \leq i \leq d,

\frac{d S _{t}^{i}}{S _{t}^{i}} = μ_{t}^{i} d t + j = 1 \sum d σ_{t}^{ij} d W_{t}^{j}, 1 \leq i \leq d,

\frac{d X _{t}}{X _{t}} = i = 1 \sum d α_{t}^{i} \frac{d S _{t}^{i}}{S _{t}^{i}} + (1 - i = 1 \sum d α_{t}^{i}) r d t .

\frac{d X _{t}}{X _{t}} = i = 1 \sum d α_{t}^{i} \frac{d S _{t}^{i}}{S _{t}^{i}} + (1 - i = 1 \sum d α_{t}^{i}) r d t .

d X_{t} = X_{t} (α_{t}^{⊺} (μ_{t} - r) + r) d t + X_{t} α_{t}^{⊺} σ_{t} d W_{t} .

d X_{t} = X_{t} (α_{t}^{⊺} (μ_{t} - r) + r) d t + X_{t} α_{t}^{⊺} σ_{t} d W_{t} .

Ω_{t} : = (ω \in C ([t, T]; R^{d}) : ω_{t} = 0) .

Ω_{t} : = (ω \in C ([t, T]; R^{d}) : ω_{t} = 0) .

J (t, x, α, μ, Σ) = E^{t, x} [U (X_{T}^{α, μ, Σ}) + λ_{0} \int_{t}^{T} F (μ_{s}, Σ_{s}) d s],

J (t, x, α, μ, Σ) = E^{t, x} [U (X_{T}^{α, μ, Σ}) + λ_{0} \int_{t}^{T} F (μ_{s}, Σ_{s}) d s],

\underline{u} (t, x) = α \in A sup Σ \in B, μ \in M in f {E^{t, x} [U (X_{T}^{α, μ, Σ}) + λ_{0} \int_{t}^{T} F (μ_{s}, Σ_{s}) d s]} .

\underline{u} (t, x) = α \in A sup Σ \in B, μ \in M in f {E^{t, x} [U (X_{T}^{α, μ, Σ}) + λ_{0} \int_{t}^{T} F (μ_{s}, Σ_{s}) d s]} .

\underline{u} (t, x) = \adjustlimits sup_{α \in A} in f_{Σ \in B} {E^{t, x} [U (X_{T}^{α, Σ}) + λ_{0} \int_{t}^{T} F (Σ_{s}) d s]},

\underline{u} (t, x) = \adjustlimits sup_{α \in A} in f_{Σ \in B} {E^{t, x} [U (X_{T}^{α, Σ}) + λ_{0} \int_{t}^{T} F (Σ_{s}) d s]},

\overset{u}{ˉ} (t, x) = \adjustlimits in f_{Σ \in B} sup_{α \in A} {E^{t, x} [U (X_{T}^{α, Σ}) + λ_{0} \int_{t}^{T} F (Σ_{s}) d s]} .

\overset{u}{ˉ} (t, x) = \adjustlimits in f_{Σ \in B} sup_{α \in A} {E^{t, x} [U (X_{T}^{α, Σ}) + λ_{0} \int_{t}^{T} F (Σ_{s}) d s]} .

\Bigl{|}U(x)-U(\bar{x})\Bigr{|}\leq Q(\left|x\right|,\left|\bar{x}\right|)\left|x-\bar{x}\right|,

\Bigl{|}U(x)-U(\bar{x})\Bigr{|}\leq Q(\left|x\right|,\left|\bar{x}\right|)\left|x-\bar{x}\right|,

\mathbb{E}\Bigl{[}\int_{t}^{T}\Bigl{|}F(\Sigma_{s})\Bigr{|}ds\Bigr{]}<\infty,

\mathbb{E}\Bigl{[}\int_{t}^{T}\Bigl{|}F(\Sigma_{s})\Bigr{|}ds\Bigr{]}<\infty,

\mathbb{E}\Bigl{[}\int_{t}^{T}\left|(\alpha_{s}^{\intercal}\mu+r-\alpha_{s}^{\intercal}\mathbf{r})x_{0}\right|^{2}+\left|{\color[rgb]{0,0,0}{\color[rgb]{0,0,0}{\color[rgb]{.75,.5,.25}{\color[rgb]{0,0,0}\alpha_{s}^{\intercal}\sigma_{s}x_{0}}}}}\right|^{2}ds\Bigr{]}<\infty.

\mathbb{E}\Bigl{[}\int_{t}^{T}\left|(\alpha_{s}^{\intercal}\mu+r-\alpha_{s}^{\intercal}\mathbf{r})x_{0}\right|^{2}+\left|{\color[rgb]{0,0,0}{\color[rgb]{0,0,0}{\color[rgb]{.75,.5,.25}{\color[rgb]{0,0,0}\alpha_{s}^{\intercal}\sigma_{s}x_{0}}}}}\right|^{2}ds\Bigr{]}<\infty.

\bar{v}(t,x)=\adjustlimits{\sup}_{\Gamma\in\mathcal{\mathcal{N}}}{\inf}_{\Sigma\in\mathcal{B}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds+U(X_{T}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}\Bigr{\}},

\bar{v}(t,x)=\adjustlimits{\sup}_{\Gamma\in\mathcal{\mathcal{N}}}{\inf}_{\Sigma\in\mathcal{B}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds+U(X_{T}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}\Bigr{\}},

\underline{v}(t,x)=\adjustlimits{\inf}_{\Delta\in\mathcal{M}}{\sup}_{\alpha\in\mathcal{A}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Delta_{s})ds+U(X_{T}^{\alpha,{\scriptscriptstyle\Delta}})\Bigr{]}\Bigr{\}}.

\underline{v}(t,x)=\adjustlimits{\inf}_{\Delta\in\mathcal{M}}{\sup}_{\alpha\in\mathcal{A}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Delta_{s})ds+U(X_{T}^{\alpha,{\scriptscriptstyle\Delta}})\Bigr{]}\Bigr{\}}.

\underline{u} (t, x) \leq \underline{v} (t, x) \leq \overset{v}{ˉ} (t, x) \leq \overset{u}{ˉ} (t, x) .

\underline{u} (t, x) \leq \underline{v} (t, x) \leq \overset{v}{ˉ} (t, x) \leq \overset{u}{ˉ} (t, x) .

Δ \in M in f α \in A sup J (t, x, α, Δ (α)) + ϵ \geq α \in A sup J (t, x, α, \overset{ˉ}{Δ} (α)) \geq J (t, x, α, \overset{ˉ}{Δ} (α)) \geq Σ \in B in f J (t, x, α, Σ) .

Δ \in M in f α \in A sup J (t, x, α, Δ (α)) + ϵ \geq α \in A sup J (t, x, α, \overset{ˉ}{Δ} (α)) \geq J (t, x, α, \overset{ˉ}{Δ} (α)) \geq Σ \in B in f J (t, x, α, Σ) .

\underline{u} \leq \underline{v} \leq \overset{u}{ˉ}, \underline{u} \leq \overset{v}{ˉ} \leq \overset{u}{ˉ} .

\underline{u} \leq \underline{v} \leq \overset{u}{ˉ}, \underline{u} \leq \overset{v}{ˉ} \leq \overset{u}{ˉ} .

\Bigl{|}\bar{v}(t,x)-\bar{v}(t,\bar{x})\Bigr{|}\leq\Phi(\left|x\right|,\left|\bar{x}\right|)\left|x-\bar{x}\right|,\quad\forall(t,x)\in[0,T]\times\mathbb{R}.

\Bigl{|}\bar{v}(t,x)-\bar{v}(t,\bar{x})\Bigr{|}\leq\Phi(\left|x\right|,\left|\bar{x}\right|)\left|x-\bar{x}\right|,\quad\forall(t,x)\in[0,T]\times\mathbb{R}.

\bar{v}(t,x)=\adjustlimits{\sup}_{\Gamma\in\mathcal{N}}{\inf}_{\Sigma\in\mathcal{B}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{t+\theta}F(\Sigma_{s})ds+\bar{v}(t+\theta,X_{t+\theta}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}\Bigr{\}}.

\bar{v}(t,x)=\adjustlimits{\sup}_{\Gamma\in\mathcal{N}}{\inf}_{\Sigma\in\mathcal{B}}\Bigl{\{}\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{t+\theta}F(\Sigma_{s})ds+\bar{v}(t+\theta,X_{t+\theta}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}\Bigr{\}}.

{\frac{\partial v}{\partial t} (t, x) + H (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0 v (T, x) = U (x) in [0, T) \times R on [T] \times R,

{\frac{\partial v}{\partial t} (t, x) + H (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0 v (T, x) = U (x) in [0, T) \times R on [T] \times R,

H (t, x, p, M) = \adjustlimits in f_{Σ \in B} sup_{a \in A} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x p + \frac{1}{2} t r (a^{⊺} Σa x^{2} M)},

H (t, x, p, M) = \adjustlimits in f_{Σ \in B} sup_{a \in A} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x p + \frac{1}{2} t r (a^{⊺} Σa x^{2} M)},

- \frac{\partial v}{\partial t} (t, x) - \adjustlimits in f_{Σ \in B} sup_{a \in A} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x \frac{\partial v}{\partial x} (t, x) + \frac{1}{2} t r (a^{⊺} Σa x^{2} \frac{\partial ^{2} v}{\partial x ^{2}} (t, x))} = 0, for (t, x) \in [0, T) \times R .

- \frac{\partial v}{\partial t} (t, x) - \adjustlimits in f_{Σ \in B} sup_{a \in A} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x \frac{\partial v}{\partial x} (t, x) + \frac{1}{2} t r (a^{⊺} Σa x^{2} \frac{\partial ^{2} v}{\partial x ^{2}} (t, x))} = 0, for (t, x) \in [0, T) \times R .

\underline{v} (t, x) \leq \overset{v}{ˉ} (t, x) for (t, x) \in [0, T] \times R .

\underline{v} (t, x) \leq \overset{v}{ˉ} (t, x) for (t, x) \in [0, T] \times R .

- \frac{\partial ϕ}{\partial t} (t_{0}, x_{0}) - H (t_{0}, x_{0}, \frac{\partial ϕ}{\partial x} (t_{0}, x_{0}), \frac{\partial ^{2} ϕ}{\partial x ^{2}} (t_{0}, x_{0})) \geq 0,

- \frac{\partial ϕ}{\partial t} (t_{0}, x_{0}) - H (t_{0}, x_{0}, \frac{\partial ϕ}{\partial x} (t_{0}, x_{0}), \frac{\partial ^{2} ϕ}{\partial x ^{2}} (t_{0}, x_{0})) \geq 0,

\tilde{H} (t, x, p, M) = \adjustlimits sup_{a \in A} in f_{Σ \in B} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x p + \frac{1}{2} t r (a^{⊺} Σa x^{2} M)} .

\tilde{H} (t, x, p, M) = \adjustlimits sup_{a \in A} in f_{Σ \in B} {λ_{0} F (Σ) + (a^{⊺} (μ - r) + r) x p + \frac{1}{2} t r (a^{⊺} Σa x^{2} M)} .

- \frac{\partial ϕ}{\partial t} (t_{0}, x_{0}) - \tilde{H} (t_{0}, x_{0}, \frac{\partial ϕ}{\partial x} (t_{0}, x_{0}), \frac{\partial ^{2} ϕ}{\partial x ^{2}} (t_{0}, x_{0})) \geq 0 in [0, T) \times R .

- \frac{\partial ϕ}{\partial t} (t_{0}, x_{0}) - \tilde{H} (t_{0}, x_{0}, \frac{\partial ϕ}{\partial x} (t_{0}, x_{0}), \frac{\partial ^{2} ϕ}{\partial x ^{2}} (t_{0}, x_{0})) \geq 0 in [0, T) \times R .

\frac{\partial v}{\partial t} (t, x) + \tilde{H} (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0, (t, x) \in [0, T) \times R .

\frac{\partial v}{\partial t} (t, x) + \tilde{H} (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0, (t, x) \in [0, T) \times R .

{\frac{\partial v}{\partial t} (t, x) + \tilde{H} (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0 v (T, x) = U (x) in [0, T) \times R on [T] \times R .

{\frac{\partial v}{\partial t} (t, x) + \tilde{H} (t, x, \frac{\partial v}{\partial x} (t, x), \frac{\partial ^{2} v}{\partial x ^{2}} (t, x)) = 0 v (T, x) = U (x) in [0, T) \times R on [T] \times R .

\overset{v}{ˉ} (t, x)

\overset{v}{ˉ} (t, x)

\displaystyle=\adjustlimits{\sup}_{\alpha\in\mathcal{A}}{\inf}_{\sigma^{2}\in\mathcal{B}}\left\{\mathbb{E}^{t,x}\Bigl{[}\ln(x)+\int_{t}^{T}\alpha_{s}\mu+(1-\alpha_{s})r-\frac{1}{2}\alpha_{s}^{2}\sigma_{s}^{2}+\lambda_{0}(\sigma_{s}-\sigma_{0})^{2}ds\Bigr{]}\right\}.

\overset{α}{^}_{s}

\overset{α}{^}_{s}

- \frac{1}{2} \overset{α}{^}_{s}^{2} + λ_{0} (1 - \frac{σ _{0}}{σ ^ _{s}})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Robust utility maximization under model uncertainty via a penalization

approach

Ivan Guo

School of Mathematical Sciences, Monash University, Melbourne, Australia

Centre for Quantitative Finance and Investment Strategies, Monash University, Australia

Nicolas Langrené

Data61, Commonwealth Scientific and Industrial Research Organisation, RiskLab Australia

Grégoire Loeper

School of Mathematical Sciences, Monash University, Melbourne, Australia

Centre for Quantitative Finance and Investment Strategies, Monash University, Australia

BNP Paribas Global Markets

Wei Ning

School of Mathematical Sciences, Monash University, Melbourne, Australia

(First version: July 31, 2019

This revised version: July 3, 2020)

Abstract

This paper addresses the problem of utility maximization under uncertain parameters. In contrast with the classical approach, where the parameters of the model evolve freely within a given range, we constrain them via a penalty function. We show that this robust optimization process can be interpreted as a two-player zero-sum stochastic differential game. We prove that the value function satisfies the Dynamic Programming Principle and that it is the unique viscosity solution of an associated Hamilton–Jacobi–Bellman–Isaacs equation. We test this robust algorithm on real market data. The results show that robust portfolios generally have higher expected utilities and are more stable under strong market downturns. To solve for the value function, we derive an analytical solution in the logarithmic utility case and obtain accurate numerical approximations in the general case by three methods: finite difference method, Monte Carlo simulation, and Generative Adversarial Networks.

Keywords: robust portfolio optimization, differential games, HJBI equation, Monte Carlo, GANs

AMS subject classifications: 49N90, 49K35, 49K20, 49L20, 49L25, 91G80

1 Introduction

This paper addresses the problem of continuous-time utility maximization. Besides the choice of utility function, a key element in the formulation of such a problem is the a priori knowledge assumed for the evolution of the underlying assets (e.g., the expected returns and the quadratic covariation of the diffusion process). In a landmark paper, Merton (1969) found an explicit solution for the problem of optimal portfolio selection and consumption, for a constant relative risk aversion (CRRA) utility function $\frac{X^{\gamma}}{\gamma}$ , $\gamma\in(0,1)$ (a.k.a. power utility or isoelastic utility). He found that the optimal fraction of the wealth to be invested in the risky asset is given by $\pi^{*}=\frac{\mu-r}{\sigma^{2}(1-\gamma)}$ 111Here, $\mu$ is the expected rate of asset returns, $\sigma^{2}$ is the variance of the asset returns, $r$ is the risk-free interest rate and $1-\gamma$ is the relative risk aversion constant., which is independent of both time and the current wealth, even though this quantity is a priori allowed to evolve dynamically. This conclusion is arguably one of the most important results in portfolio optimization (and it is also consistent with the results of Markowitz portfolio optimization Markowitz 1952). It has led to various extensions, some of which are illustrated in the textbook by Rogers (2013).

In the original Merton problem, the evolution of the risky asset, although stochastic by essence, is governed by the Black-Scholes model (Black and Scholes, 1973) with fixed parameters $\mu,r$ and $\sigma$ . This is a very simplistic model for the underlying asset price. Stochastic models (for the volatility and interest rates) that describe the price evolution more realistically have later emerged. Several papers have addressed the problem in this context: Matoussi et al. (2015) examined the case of stochastic volatility, while Noh and Kim (2011) addressed the case of stochastic interest rates. The expected return (or drift) $\mu$ plays an essential role in the optimal allocation; even when it is considered stochastic, it is still assumed to be an observable input of the problem. This assumption clearly does not match the reality that investors are facing. Several works by Lakner (1995) and then Bel Hadj Ayed et al. (2017) addressed the utility maximization problem with an uncertain drift, although it was assumed to follow some form of prescribed dynamics or prior distribution.

Two decades ago, the concept of robust portfolio optimization had emerged. It was first introduced in the operations research literature by El Ghaoui and Lebret (1997) and Ben-Tal and Nemirovski (1998). Instead of assuming a model with a known drift, interest rate or volatility, the problem of robust optimal allocation assumes that they will evolve dynamically in the most unfavourable way within a given range. The resulting allocation process tends to be more stable and less vulnerable to changes and misspecifications in model parameters.

There has been a substantial amount of literature on robust portfolio optimization over the last decade and the area is still developing. A comprehensive introduction of the trends and methods can be found in the book by Fabozzi et al. (2007). Gabrel et al. (2014) provided an overview of advances in robust optimization, including but not limited to applications in finance, where they stated that “robustifying” stochastic optimization is one of the key advancements that should develop following the 2007 financial crisis. We list below a few pieces of influential research in this direction. For instance, Elliott and Siu (2009) supposed that an agent wants to maximize the minimal utility function, over a family of probability measures. This problem was then formulated as a Markovian regime-switching model, where the market parameters are modulated by a continuous-time finite-state Markov chain that is determined by the probability measures. Glasserman and Xu (2013) went beyond parameter uncertainties to consider the effect of changes in the probability distributions that define an underlying model. They used relative entropy to quantify the deviation of the worst-case model from a baseline model. Fouque et al. (2016) studied an asset allocation problem with stochastic volatility and uncertain correlation, and derived closed-form solutions for a class of utility functions. Ismail and Pham (2019) studied a robust Markowitz portfolio selection problem under covariance uncertainty. The value function is obtained by optimizing the worst-case mean-variance functional, over the admissible investing strategies $\alpha$ . They then solved this problem by the McKean-Vlasov dynamic programming approach and characterized the solution with a Bellman-Isaacs PDE. They also illustrated the robust efficient frontier in two examples: uncertain volatilities and uncertain correlation. Last but not least, we also mention the work by Talay and Zheng (2002), which studied the robust optimization problem in the context of derivatives hedging.

A robust investment process can be interpreted as a two-player game. On one hand, the market can be thought of as an adversarial player controlling the volatility (or the drift) in order to minimize the gains of an investor, on the other hand, the investor, who controls the allocation of the portfolio, is trying to maximize her gains under the worst possible behaviour of the market. The two controllers have conflicting interests, with the gain of one player being a loss for the other. Hence we call this competition between the investor and the market a two-player zero-sum stochastic differential game (SDG). Differential games were first introduced by Isaacs (1965); the book by Fleming and Soner (2006) provides a concise introduction to the theory of viscosity solutions and deterministic zero-sum differential games. The first complete theory for two-player zero-sum SDGs was developed by Fleming and Souganidis (1989), where they proved the existence of value functions of the games. Buckdahn and Li (2008) generalized the results of Fleming and Souganidis (1989) by considering the gain functional as a solution of a Backward Stochastic Differential Equation (BSDE). With the help of BSDE methods, they proved the Dynamic Programming Principle (DPP) for the value functions in a more straightforward approach. Some more recent works on zero-sum SDG include Hernández-Hernández and Sîrbu (2018), Baltas et al. (2019) and Cosso and Pham (2019).

The main novelty of our work is threefold. Firstly, we do not assume a given range of parameters in the evolution of the underlying process. In other papers considering uncertain volatility, the authors assume the admissible $\sigma\in[\sigma_{\min},\sigma_{\max}]$ , where $\sigma_{\min}$ and $\sigma_{\max}$ are model bounds in accordance with the uncertainty about future fluctuations. Instead, we allow the parameters to move freely and use a penalty function $F=F(r,\mu,\sigma,\ldots)$ to penalize unrealistic values of the parameters. Mathematically speaking, the penalty function gives some coercivity to the problem so that an optimal solution can be found. This approach has been used for robust derivatives pricing in Tan et al. (2013) and Guo et al. (2017). Note that one can asymptotically recover the aforementioned approaches that involve a fixed parameter range, by taking the penalty function $F$ to be 0 over a given set and $+\infty$ outside.

Secondly, in the classical papers studying two-player zero-sum SDGs, Fleming and Souganidis (1989) and Nisio (2015) made the assumptions that the domain is bounded and the utility function $U$ is bounded and Lipschitz continuous. The present paper extends these results to more general assumptions by considering an unbounded domain and an unbounded utility function $U$ . Moreover, we prove that the lower- and upper-value of the SDG (2)-(3) in fact coincide.

Last but not least, we devise two innovative algorithms to compute the value functions, which are control randomization and Generative Adversarial Networks (GANs). In particular, it is, to our knowledge, the first application of the control randomization method (see Kharroubi et al. 2014) in the context of a robust portfolio optimization problem. It is also the first time GANs are used to solve a robust optimization problem in the field of quantitative finance.

GANs are an exciting recent innovation in machine learning. The fundamental principle of GANs is to use two different neural networks as two opponents with conflicting goals, and its solution is a Nash equilibrium. Hence, GANs training is closely related to game theory. Cao et al. (2020) reviewed the minimax structures underlying GANs, and they established theoretical connections between GANs and Mean-Field Games. However, there are few applications of GANs in quantitative finance so far. The only relevant work is by Wiese et al. (2020). Being inspired by GANs’ ability to generate images, they approximated a realistic asset price simulator using adversarial training techniques.

The rest of the paper is organized as follows. In Section 2, we formulate a portfolio optimization problem in a robust setting and introduce the uncertain drift and uncertain volatility processes. In the subsequent sections, we only focus on the uncertain volatility case because the uncertain drift case can be solved in a similar way. In Section 3, we define the value functions for static games and two-player zero-sum SDGs. In Section 4 we show that the differential game has a saddle point and as a consequence, the lower- and upper-values of the SDG coincide. We prove that the value function satisfies the DPP in Section 5 and that our value function is the unique viscosity solution of an HJBI equation in Section 6. In section 7.1, we derive a closed-form solution for the logarithmic utility. In section 7.2, we add some noise to the covariance matrix and simulate portfolios with robust and non-robust strategies, respectively. Then, in section 7.3, we test our robust mechanism by constructing two empirical portfolios using market data. In section 7.4 and 7.5, we provide numerical results for general utility functions using PDE techniques via finite difference methods and Monte Carlo simulations via control randomization. Finally, in section 7.6, we present the algorithm and result of solving a robust portfolio optimization problem with GANs.

2 Problem formulation

We consider a portfolio with $d$ risky assets and one risk-free asset compounding at a constant interest rate $r\in\mathbb{R}$ . The price process of the risky assets is denoted by $S_{t}\in\mathbb{R}^{d}$ $(0\leq t\leq T)$ , and the $i$ th element of $S_{t}$ follows the dynamics

[TABLE]

with drift $\mu_{t}\in\mathbb{R}^{d}$ , covariance matrix $\Sigma_{t}\in\mathbb{R}^{d\times d}$ and its square-root matrix $\sigma_{t}\coloneqq\Sigma_{t}^{\frac{1}{2}}\in\mathbb{R}^{d\times d}$ .

We consider a probability space $\left(\Omega,\mathcal{F},\mathbb{P}\right)$ , and processes $\mu,\Sigma$ which are progressively measurable with respect to the $\mathbb{P}$ -augmented filtration of the $d$ -dimensional Brownian motion $W_{t}$

Let $X_{t}\in\mathbb{R}$ be the value of the portfolio at time $t$ . A portfolio allocation strategy $\alpha_{t}\in\mathbb{R}^{d}$ represents the proportion of total wealth the agent invests in the $d$ risky assets at time $t$ , and $1-\sum_{i=1}^{d}\alpha_{t}^{i}$ is the proportion invested in the risk-free asset.

Assuming the strategy is self-financed, the wealth process evolves as follows

[TABLE]

We define $\mathbf{r}\coloneqq r\times\mathbb{\mathbf{1}}$ with $\mathbf{1}\in\mathbb{R}^{d}$ being a $d$ -dimensional ones vector. The wealth evolution can be rewritten as

[TABLE]

We will follow the framework set in Fleming and Souganidis (1989) and Talay and Zheng (2002). We first introduce the canonical sample spaces for the underlying Brownian motion in (1) and (2). For each $t\in[0,T]$ , we set

[TABLE]

We denote by $\mathbb{F}=\mathcal{F}_{t,s}$ $(s\in[t,T])$ , the filtration generated by the canonical process from time $t$ to time $s$ . Equipped with the Wiener measure $\mathbb{P}_{t}$ on $\mathcal{F}_{t,T}$ , the filtered probability space $(\Omega_{t},\mathcal{F}_{t,T},\mathbb{P}_{t},\mathbb{F})$ is the canonical sample space, and $W$ is the standard $d$ -dimensional Brownian motion.

Now, we introduce the concept of admissible controls.

Definition 1.

An admissible control process $\Sigma$ (resp. $\mu$ ) for the market on $[t,T]$ is a progressively measurable process with respect to $\mathbb{F}$ , taking values in a compact convex set $B\subset\mathbb{S}^{d}$ (resp. $M\subset\mathbb{R}^{d}$ ), where $\mathbb{S}^{d}\subset\mathbb{R}^{d\times d}$ is a set of symmetric positive semi-definite matrices. The set of all admissible $\Sigma$ (resp. $\mu$ ) on $[t,T]$ is compact and convex, denoted by $\mathcal{B}$ (resp. $\mathcal{M}$ ).

Definition 2.

An admissible control process $\alpha$ for the investor on $[t,T]$ is a progressively measurable process with respect to $\mathbb{F}$ , taking values in a compact convex set $A\subset\mathbb{R}^{d}$ . The set of all admissible $\alpha$ is compact and convex, denoted by $\mathcal{A}$ .

Note that although the sets for the value of the controls are compact, in practice, $A=[-R,R]^{d},B=[-R,R]^{d\times d}\cap\mathbb{S}^{d}$ where $R$ is arbitrarily large.

Next, let us define the payoff function as the expectation of a terminal utility function $U$ plus a penalty function $F$ :

[TABLE]

where $\mathbb{E}^{t,x}(\cdot)$ denotes the expectation given the initial time and wealth $(t,x)\in[0,T]\times\mathbb{R}$ . and $\lambda_{0}\in\mathbb{R}$ is a positive constant. Throughout the paper, we will often include $\alpha,\mu$ and $\Sigma$ in the superscript of $X$ to indicate the dependency of the wealth process on the allocation, drift and volatility processes. Our objective is to find the optimal portfolio allocation process $\alpha$ that maximizes the worst-case payoff function given by the drift process $\mu$ or the covariance process $\Sigma$ . Throughout the paper, $F$ will be a convex function in $\Sigma_{s}$ and $\mu_{s}$ .

2.1 Robust value functions

We are now ready to define the value functions. In our problem, the covariance (or drift) is unknown. We want to find the optimal portfolio allocation process that maximizes the worst-case situation given by the covariance (or drift). Then, given an initial condition $(t,x)\in[0,T]\times\mathbb{R}$ , this value is given by

[TABLE]

We say $\hat{\alpha}$ and $\hat{\Sigma},\hat{\mu}$ are optimal controls if $\underline{u}(t,x)=J(t,x,\hat{\alpha},\hat{\mu},\hat{\Sigma})=\inf_{\Sigma\in\mathcal{B},\mu\in\mathcal{M}}J(t,x,\hat{\alpha},\mu,\Sigma)$ . Hereafter, we focus on the robust optimization problem with an uncertain covariance, that is,

[TABLE]

because the uncertain drift case can be studied in a similar manner.

This problem is known as a static game, and the function $\underline{u}(t,x)$ is called the lower value of the static game. If we reverse the moving order of the two players, we obtain the upper value of the static game, which is

[TABLE]

Note that $X_{s}^{\alpha,{\scriptscriptstyle\Sigma}},\forall s\in[t,T]$ denotes a process controlled by processes $\alpha,\Sigma$ . When $X_{s}^{\alpha,{\scriptscriptstyle\Sigma}}$ starts from an initial condition $(t,x)$ , we write the expectation of $f(X_{s}^{\alpha,{\scriptscriptstyle\Sigma}})$ as $\mathbb{{E}}^{t,x}\left[f(X_{s}^{\alpha,{\scriptscriptstyle\Sigma}})\right]$ .

2.2 Assumptions

In this section, we make the following assumptions which will hold throughout the paper.

Assumption 1.

The utility function $U:\mathbb{R}\rightarrow\mathbb{R}$ is a continuous, increasing and concave function such that

[TABLE]

where $Q(\left|x\right|,\left|\bar{x}\right|)$ is a positive polynomial function.

Assumption 2.

The penalty function $F:B\rightarrow\mathbb{R}$ is a continuous convex function, and $F$ attains its minimum in the interior of $B$ .

In addition to Definition 1 and 2, we need the following conditions to ensure the existence and uniqueness of a strong solution of the SDE (2).

Assumption 3.

For any $\Sigma_{s,s\in[t,T]}\in B$ and $\alpha_{s,s\in[t,T]}\in A$ , we have

[TABLE]

and for any fixed value $x_{0}$ ,

[TABLE]

3 Value functions of two-player zero-sum stochastic

differential games

In order to complete the description of the game, we need to clarify what information is available to the controllers at each time $s$ . For multi-stage discrete time games this can be formulated inductively. However, this is problematic in continuous time, because control choices can be changed instantaneously (Fleming and Soner, 2006, Chapter 11). To address this issue, Fleming and Souganidis (1989) adopted the idea of a progressive strategy in a two-player zero-sum SDG, which is defined as follows:

Definition 3.

An admissible strategy $\Gamma$ (resp. $\Delta$ ) for the investor (resp. market) on $[t,T]$ is a mapping $\Gamma:\mathcal{B}\rightarrow\mathcal{A}$ (resp. $\Delta:\mathcal{A}\rightarrow\mathcal{B}$ ) such that, for any $s\in[t,T]$ and $\Sigma,\tilde{\Sigma}\in\mathcal{B}$ (resp. $\alpha,\tilde{\alpha}\in\mathcal{A}$ ), $\Sigma(u)=\tilde{\Sigma}(u)$ (resp. $\alpha(u)=\tilde{\alpha}(u)$ ) for all $u\in[t,s]$ implies $\Gamma(\Sigma)(u)=\Gamma(\tilde{\Sigma})(u)$ (resp. $\Delta(\alpha)(u)=\Delta(\tilde{\alpha})(u)$ ) for all $u\in[t,s]$ . The set of all admissible strategies for the investor (resp. market) on $[t,T]$ is denoted by $\mathcal{N}$ (resp. $\mathcal{M}$ ).

In the two-player zero-sum SDG, one player is allowed to strategically adapt his control according to the control of his opponent in a non-anticipative fashion. This is in contrast to the static game, in which the player must choose his control without any knowledge of the opponent’s choice. Then, we may define another set of value functions using these admissible strategies: the upper value function of the two-player zero-sum SDG is defined by

[TABLE]

and the corresponding lower value function is

[TABLE]

The terms “lower” and “upper” are not obvious at first glance, one might first guess the opposite because $\inf\sup\geq\sup\inf$ . We will justify $\underline{v}\leq\bar{v}$ in Corollary 2 using the comparison principle.

4 Existence of a value for the differential games

In this section, we prove that the four value functions defined in the previous sections all coincide, i.e., $\underline{u}(t,x)=\underline{v}(t,x)=\bar{v}(t,x)=\bar{u}(t,x)$ . This is established via the following propositions.

Proposition 1.

The four value functions defined in Section 2 and Section 3 satisfy the following inequalities:

[TABLE]

Proof.

The inequality $\underline{v}(t,x)\leq\bar{u}(t,x)$ holds because $\mathcal{M}$ contains constant mappings, i.e., $\Delta(\alpha)=\Sigma$ for any $\alpha\in\mathcal{A}$ and fixed $\Sigma\in\mathcal{B}$ . Similarly, $\underline{u}(t,x)\leq\bar{v}(t,x)$ holds because $\mathcal{N}$ contains a copy of $\mathcal{A}$ . Then for all $\alpha\in\mathcal{A}$ and $\epsilon>0$ , there exists some $\bar{\Delta}$ such that

[TABLE]

So $\underline{u}(t,x)\leq\underline{v}(t,x)$ . A similar argument gives us $\bar{v}(t,x)\leq\bar{u}(t,x)$ . Hence we have

[TABLE]

In order to complete the proof, it suffices to show that $\underline{v}(t,x)\leq\bar{v}(t,x)$ . This is proven in Corollary 2. ∎

Proposition 2.

Let $U$ be a continuous, increasing and concave utility function on $\mathbb{R}$ , suppose that Assumption 2 holds, then $\underline{u}(t,x)=\underline{v}(t,x)=\bar{v}(t,x)=\bar{u}(t,x)$ .

Proof.

See Appendix A.1. ∎

Using Proposition 2, we can conclude that there exists a value for the two-player zero-sum SDG, i.e., $\underline{v}=\bar{v}$ . We focus on the analysis of $\bar{v}(t,x)$ in the following sections.

5 Dynamic programming principle

If the drift and volatility functions of dynamics (2) and the utility function $U$ were bounded and $U$ was Lipschitz continuous, we could apply the results of Fleming and Souganidis (1989) directly. However, in our model, the drift and volatility functions are unbounded and $U$ is only locally Lipschitz continuous. So we must extend the classical results and use localization techniques to prove that the value function $\bar{v}(t,x)$ defined in (7) satisfies the Dynamic Programming Principle (DPP). The DPP is widely used in numerical methods, such as the least squares Monte Carlo method.

Before presenting the main result, we require the following important property of the value function.

Proposition 3.

Suppose that Assumptions 1 and 3 hold true. Then the value function $\bar{v}(t,x)$ (7) is locally Lipchitz continuous w.r.t $x$ . There exists a positive polynomial function $\Phi$ such that

[TABLE]

Proof.

See Appendix A.2. ∎

We are now in the position to present a main result in this paper.

Theorem 1 (Dynamic Programming Principle).

Suppose that Assumptions 1, 2 and 3 hold true. Define the value function $\bar{v}(t,x)$ by (7) for $(t,x)\in[0,T]\times\mathbb{R}$ . Let $t+\theta$ be a stopping time, then, for $t\leq t+\theta\leq T$ , we have

[TABLE]

Proof.

See Appendix A.3. ∎

As a consequence of the DPP, the value function $\bar{v}(t,x)$ satisfies the following property.

Corollary 1.

Suppose that Assumptions 1, 2 and 3 hold true. Then the value function $\bar{v}(t,x)$ defined in (7) is Hölder continuous in $t$ on $[0,T]$ .

Proof.

See Appendix A.4. ∎

6 Viscosity solution of the HJBI equation

In this section, we prove that the value function is the unique viscosity solution of a Hamilton-Jacobi-Bellman-Isaacs equation. In 6.1, we prove the existence of the viscosity solution, and we state the uniqueness of this viscosity solution in 6.2.

6.1 Existence of a viscosity solution of the HJBI Equation

Now we state another main result in this paper; the proof is a modification of Talay and Zheng (2002).

Theorem 2.

Suppose that Assumptions 1, 2 and 3 hold true. Then the value function $\bar{v}(t,x)$ defined in (7) is a viscosity solution of the HJBI equation

[TABLE]

where

[TABLE]

for $(t,x,p,M)\in[0,T]\times\mathbb{R}\times\mathbb{R}\times\mathbb{R}$ .

Proof.

See Appendix A.5. ∎

6.2 Comparison principle for the HJBI Equation

In this subsection, we present the comparison principle for equation (12), which implies the uniqueness of the viscosity solution of the HJBI equation. We can adapt the proof from Pham (2009, Theorem 4.4.4) for an HJB equation and straightforwardly extend it to HJBI equations with two controls.

Theorem 3.

Comparison Principle (Pham 2009, Theorem 4.4.4).

Let Assumptions 1, 2 and 3 hold true. Define the HJBI equation as

[TABLE]

Let $U$ (resp. $V$ ) be a u.s.c. viscosity subsolution (resp. l.s.c. supersolution) with polynomial growth condition to equation (14). If $U(T,\cdot)\leq V(T,\cdot)$ on $\mathbb{R}$ , then $U\leq V$ on $[0,T]\times\mathbb{R}$ .

As a consequence of the comparison principle, the function $\bar{v}(t,x)$ (7) is in fact the unique viscosity solution of the HJBI equation (12).

Corollary 2.

Let Assumptions 1, 2 and 3 hold true. Define the lower and upper value functions of the two-player zero-sum SDG by (8) and (7). Then

[TABLE]

Proof.

From Theorem 2, $\bar{v}(t,x)$ is a viscosity solution of the HJBI equation (12). Let $\phi\in C^{\infty}([0,T)\times\mathbb{R})$ be a test function such that $(t_{0},x_{0})\in[0,T)\times\mathbb{R}$ is a local minimum of $\bar{v}-\phi$ . Using the viscosity supersolution property of $\bar{v}(t,x)$ , we have

[TABLE]

where $H(t,x,p,M)$ is defined by (13). Define

[TABLE]

It is obvious that $H\geq\tilde{H}$ , so

[TABLE]

Thus $\bar{v}(t,x)$ is a supersolution of the HJBI equation

[TABLE]

Using the results of Fleming and Souganidis (1989) and a similar argument, we can prove the lower value function $\underline{v}(t,x)$ (8) is the unique viscosity solution of the HJBI equation

[TABLE]

Finally, by the comparison principle, we have $\underline{v}(t,x)\leq\bar{v}(t,x)$ , as required. ∎

7 Numerical results

In this section, we provide a few numerical examples with commonly used utility functions. We first establish an analytical solution in the case of the Logarithmic utility function. Then we numerically approximate the value functions for both the Logarithmic and CRRA utility functions using an implicit finite difference method, a control randomization method, and a Generative Adversarial Network method.

7.1 Analytical solution

In the first example, we consider $U(x)=\ln(x)$ and the penalty function $F(\sigma_{t}^{2})=(\sigma_{t}-\sigma_{0})^{2}$ . It is possible to find the explicit solution for the value function as well as the optimal controls. Writing $X_{T}$ explicitly, the value function becomes:

[TABLE]

To find the optimal $\alpha_{s}$ and $\sigma_{s}^{2}$ , we can differentiate instantaneously the integrand $\alpha_{s}(\mu-r)+r-\frac{1}{2}\alpha_{s}^{2}\sigma_{s}^{2}+\lambda_{0}(\sigma_{s}-\sigma_{0})^{2}$ with respect to $\alpha_{s}$ and $\sigma_{s}^{2}$ respectively. Then we obtain the following optimality conditions:

[TABLE]

which leads to a quartic equation

[TABLE]

The optimal $\hat{\sigma}_{s}$ and $\hat{\alpha}_{s}$ can be solved from equation (20) explicitly; we provide the solution in the Appendix A.6. The equation (20) always has a real positive root, hence the optimal volatility $\hat{\sigma}_{s}\in B$ and optimal strategy $\hat{\alpha}_{s}\in A$ . By substituting the optimal controls into (17), we obtain the analytical solution of the value function. From equations (18)–(19), we observe that the optimal volatility and investment strategy are both constants, being independent of the wealth $X_{s}$ and the time $s$ . The classical optimal portfolio strategy given by Merton is also a constant, where $\alpha^{*}=\frac{\mu-r}{\sigma^{2}(1-\gamma)}$ for CRRA utility functions. However, in our problem, it is not possible to find an analytical solution for a power utility function. We will use numerical methods to estimate the values in the next subsection. It is worth mentioning that, when $U(x)=\ln(x)$ , we can apply the above method to portfolios with multiple risky assets and get the analytical solutions by solving a system of optimality conditions. The detailed process is very similar, hence omitted here. Moreover, the reference volatility $\sigma_{0}$ is not necessarily a constant, it can be a local volatility depending on time and stock price. However, for multiple assets, it would increase the dimension of the problem.

7.2 Comparison of robust and non-robust portfolios with Monte Carlo simulation

In this section, we implement our robust strategy using Monte Carlo simulations, and compare the performance of robust and non-robust portfolios.

As we know, in the real world volatility estimates are noisy and biased, though likely to oscillate around a reference value in the long run. In the first experiment, we have a reference covariance matrix $\Sigma_{0}$ , which is estimated according to historical data. We assume that the real-world covariance is the reference covariance $\Sigma_{0}$ plus some noise. We construct robust and non-robust portfolios consisting of two risky assets and one risk-free asset. For the robust portfolio, we use $\lambda_{0}F(\Sigma_{s})=\lambda_{0}\bigl{\|}\Sigma_{s}-\Sigma_{0}\bigl{\|}_{2}^{2}$ ( $\bigl{\|}\cdot\bigl{\|}_{2}$ denotes the usual Frobenius norm) as the penalty function, then the analytical robust investment strategy $(\hat{\alpha}_{s}^{1},\hat{\alpha}_{s}^{2})$ can be calculated in a similar method to the one in section 7.1. For the non-robust one, we use $\Sigma_{0}$ as the covariance, then calculate the non-robust strategy $(\alpha_{s}^{1},\alpha_{s}^{2})$ accordingly. Assuming the real covariance matrix during the investment process is $\Sigma_{\mathrm{real}}=\Sigma_{0}+\varepsilon\times\text{noise}$ , where the noise follows a standard normal distribution $\mathcal{N}(0,1)$ and $\varepsilon$ is the magnitude of the noise, we use Monte Carlo simulations to estimate the expected utility function

[TABLE]

We substitute $\alpha_{s}=(\hat{\alpha}_{s}^{1},\hat{\alpha}_{s}^{2})$ in (21) for the robust portfolio, and $\alpha_{s}=(\alpha_{s}^{1},\alpha_{s}^{2})$ for the non-robust one.

The results with various $\lambda_{0}$ are shown in Figures 7.2 to 7.2, where we used $2\times 10^{5}$ paths in the simulation and the initial wealth $X_{0}=1$ . We can observe that the robust portfolio may underperform when there is little noise. But, as the noise size $\varepsilon$ increases, the robust strategy will outperform the non-robust strategy eventually. Comparing Figures 7.2, 7.2 and 7.2, we can find that when the penalty is relatively weak ( $\lambda_{0}=0.01$ ), it takes a bigger noise size for the robust strategy to outperform. When the penalty is stiff ( $\lambda_{0}=70$ ), the robust strategy will outperform with a very small noise size. The robust expected utility is almost a constant for all sizes of noise in Figure 7.2, meaning that our model is very robust to changes in market circumstances. Among the three values of $\lambda_{0}$ illustrated, Figure 7.2 is probably the most attractive to investors. When the reference $\Sigma_{0}$ is perfect, the robust portfolio only loses to the non-robust one by a little, but when $\Sigma_{0}$ is wrong, the robust portfolio outperforms the non-robust one by a large amount. It means the price we pay for the robustness is tolerable, but the potential reward is substantial.

Define the *crossing point $\varepsilon$ * as the value of $\varepsilon$ for which the robust expected utility matches the non-robust expected utility. Figure 7.2 depicts how the crossing point $\varepsilon$ varies with respect to $\lambda_{0}$ . It tells us how much should our reference covariance be wrong for the robust portfolio to outperform the non-robust portfolio. The behaviour of the robust portfolio varies with $\lambda_{0}$ . For a certain $\varepsilon$ , by looping over a range of $\lambda_{0}$ , we can find the one giving us the maximal robust expected utility. This relation is plotted in Figure 7.2. With this plot, if we know how confident we are with the reference $\Sigma_{0}$ (i.e., the value of $\varepsilon$ ), we can choose the best $\lambda_{0}$ for robust portfolio allocation.

7.3 Comparison of robust and non-robust portfolios with empirical market data

In the second experiment, we implement the robust and non-robust strategies with empirical market data. We have $1007$ portfolios, and we construct each portfolio according to robust and non-robust allocations, respectively. Each portfolio consists of $2$ risky assets and $1$ risk-free asset, with a maturity of $T=1$ year. The portfolios’ starting dates range from 02/04/15 to 03/04/19 (for example, the $1$ st portfolio starts on 02/04/15 and lasts for one year, the $1007$ th portfolio starts on 03/04/19 and lasts for one year as well). We choose the S $\&$ P500 ( $\wedge$ GSPC) and SPDR Gold Shares (GLD)222Stock prices are downloaded from Yahoo Finance. as our risky assets and use a constant interest rate $r=0.015$ . For a specific portfolio, we set $\Sigma_{0}$ to be the sample covariance estimator of the $5$ years of daily relative returns before the starting date. The estimated annual expected returns $\mu_{1},\mu_{2}$ are the exponentially weighted moving average of the daily relative returns with a $5$ -year lookback window and $2.75$ -year half-life. With a decay parameter $\beta=0.999$ , for the $n$ th portfolio, $\mu_{i,i=1,2}=252\times\frac{1}{1-\beta^{1260}}\sum_{t=0}^{1260}(1-\beta)\beta^{t}\frac{S^{i}_{n-t}-S^{i}_{n-t-1}}{S^{i}_{n-t-1}}$ .

In this experiment, we use a logarithmic utility function and a penalty function $\lambda_{0}F(\Sigma_{s})=\lambda_{0}\bigl{\|}\Sigma_{s}-\Sigma_{0}\bigl{\|}_{2}^{2}$ . At the beginning of the investment process for each portfolio, we estimate parameters $\mu_{1},\mu_{2},\Sigma_{0}$ and then compute the robust and non-robust portfolio allocations accordingly. Starting from an initial wealth $X_{0}=1$ , the wealth of the non-robust portfolio evolves as

[TABLE]

where $(\alpha_{n}^{1},\alpha_{n}^{2})$ are the non-robust allocations on day $n$ . For the wealth of the robust portfolio, just replace $(\alpha_{n}^{1},\alpha_{n}^{2})$ with the robust allocations $(\hat{\alpha}_{n}^{1},\hat{\alpha}_{n}^{2})$ in (22). Finally, by averaging the $\ln(X_{T})$ of all the portfolios, we get the expected utility function.

Figures 7.3–7.3 present the terminal wealth $X_{T}$ of the $1007$ robust and non-robust portfolios. For a small $\lambda_{0}$ , the robust portfolios are very stable. No matter how the market changes, the robust terminal wealth stays around $1$ . As $\lambda_{0}$ increases, the robust portfolios start to show fluctuations. Eventually, their behaviour converges to that of the non-robust portfolios as $\lambda_{0}$ approaches to infinity, which corresponds to the non robust case. This behaviour is consistent with our expectations. The penalty function is not playing its role when $\lambda_{0}$ is close to zero. Hence the robust allocations are optimal for the most chaotic market situations, and the investment strategies are very conservative. As $\lambda_{0}$ becomes larger, the penalty function comes into play and prevents extreme volatilities. As a consequence, the robust strategies are less conservative, and portfolios will show more fluctuations under regime changes.

We show the robust and non-robust expected utilities in Figure 1. It depicts how $\mathbb{E}[\ln(X^{\alpha^{1},\alpha^{2}}_{T})]$ and $\mathbb{E}[\ln(X^{\hat{\alpha}^{1},\hat{\alpha}^{2}}_{T})]$ change w.r.t. $\lambda_{0}$ . We can compare this plot with Figures 7.2, 7.2, 7.2 and 7.2 in section 7.2. For a given amount of noise, the robust portfolio may underperform for small $\lambda_{0}$ , but the value will increase gradually and reach a highest point. Finally, the robust expected utility will converge to the non-robust one.

To illustrate the time evolution of the portfolio wealth, we show the stock prices and wealth of two portfolios, starting on 2017-01-03 (Figure 2) and 2018-01-26 (Figure 3), respectively. For the portfolio in Figure 2, the optimal non-robust allocations are $\alpha^{1}=5.778,\alpha^{2}=-2.174$ , and the robust allocations with $\lambda_{0}=200$ are $\hat{\alpha}^{1}=3.083,\hat{\alpha}^{2}=-1.452$ . The allocations are both constant, independent of time. The S $\&$ P500 keeps rising in Figure 2a, while there are some fluctuations in the Gold price. Over the same period, the absolute performance of the non-robust portfolio is better all the way (Figure 2b). For the portfolio in Figure 3, we have $\alpha^{1}=9.418,\alpha^{2}=0.301$ , and $\hat{\alpha}^{1}=3.940,\hat{\alpha}^{2}=-0.054$ . Since the proportions invested in Gold are small for both robust and non-robust portfolios, the trend of wealth is dominated by the price of S $\&$ P500. There are two big drops happening in Feb. 2018 and Dec. 2018, respectively. These are also reflected in the portfolio wealth in Figure 3b. However, compared with the non-robust strategy, the robust strategy is more conservative. Hence, the robust portfolio loses less during the market shocks and outperforms the non-robust one.

From the above empirical experiments and the Monte Carlo simulations from subsection 7.2 , we can see that, by adding this robust mechanism with a properly chosen $\lambda_{0}$ , the portfolio value can overcome a wrong covariance matrix estimate and is less vulnerable to sudden market shocks. Furthermore, unlike other robust methods which only consider the worst case, our model is more flexible and provides a greater range of more practical in-between option.

7.4 Implicit finite difference method

In this section, we are computing the value function via an implicit finite difference method. We use the penalty function $\lambda_{0}F(\sigma_{t}^{2})=\lambda_{0}(\sigma_{t}^{2})^{2}$ for simplicity. Then the HJBI equation is

[TABLE]

where the Hamiltonian is defined by

[TABLE]

Solving for the optimal controls in (24) using the first order condition, we obtain $\hat{\mathbf{a}}=-\frac{(\mu-r)x\bar{v}_{x}}{\sigma^{2}x^{2}\bar{v}_{xx}}$ and $\hat{\sigma}^{2}=\Bigl{(}-\frac{(\mu-r)^{2}\bar{v}_{x}^{2}}{4\lambda_{0}\bar{v}_{xx}}\Bigr{)}^{1/3}$ . Substituting $\hat{\mathbf{a}}$ and $\hat{\sigma}^{2}$ into the PDE (23), we obtain

[TABLE]

where $C=(3\times 2^{-\frac{4}{3}})\lambda_{0}^{\frac{1}{3}}(\mu-r)^{\frac{4}{3}}$ . Note we have shown in Section 4 that $\bar{v}_{xx}<0$ .

Since the PDE (23) is non-linear, in order to use the implicit finite difference method, we first linearize the function $H$ with respect to the second order term via the Legendre transform. This method was also used by Jonsson and Sircar (2002a, b) to solve nonlinear HJB equations. We also combine the linearization step with a fixed-point iteration scheme.

Define $H^{*}$ as the Legendre transform of $H$ with respect to the second order term; it is given by

[TABLE]

where $C_{2}=\frac{5}{3}(\frac{2}{3})^{-\frac{2}{5}}C^{\frac{3}{5}}$ . Hence, we can represent $H(\bar{v}_{xx})$ as the supremum of linear functions of $\bar{v}_{xx}$ ,

[TABLE]

It is difficult to check the condition for stability in our PDE as the optimal $a$ is unknown. Fortunately, implicit finite difference methods have a weaker requirement for stability than explicit finite difference methods.

We set the time grid as $0,1,...,n,n+1,...,N$ , and the spatial grid as $1,2,...i,i+1,...M$ . With the maturity $T=1$ , we use a constant time step $\Delta t=\frac{T}{N}$ and a constant spatial step $\Delta x$ . We apply a forward approximation for $\bar{v}_{t}$ , a central approximation for $\bar{v}_{x}$ , and a standard approximation for $\bar{v}_{xx}$ . Working backward in the implicit scheme, at each time step $n$ , the optimal $\hat{a}$ in (25) is the solution of the first order condition $\bar{v}_{xx}^{n}+C_{2}(\bar{v}_{x}^{n})^{\frac{4}{5}}\frac{2}{5}\hat{a}^{-\frac{3}{5}}=0,$ or equivalently,

[TABLE]

Although we do not have the true values for $\bar{v}^{n}$ as the values of $\bar{v}^{n}$ depend on $\hat{a}$ , we can use a fixed-point iteration scheme to find the solution of equation (26). First we make an initial guess $\hat{a}_{0}$ using the known values $\bar{v}^{n+1}$ , then iteratively generate a sequence $\hat{a}_{k,k=1,2,...}$ with $\hat{a}_{k}=f(\hat{a}_{k-1})$ until $\hat{a}_{k}$ converges.

Finally we can substitute the discrete approximations of the derivatives into the HJBI equation (23), and we obtain the implicit form:

[TABLE]

Let $\mathbf{B}$ be the coefficient matrix, $K^{n}$ the value vector at time $n$ and $F^{n+1}$ the right hand side of (27). Then equation (27) can be written in a matrix notation:

[TABLE]

The algorithm for this method is summarized in Algorithm 1.

7.4.1 Logarithmic utility function

In the 1-asset example, we use the logarithmic utility function and the penalty function $\lambda_{0}F(\sigma_{t}^{2})=\lambda_{0}(\sigma_{t}^{2})^{2}$ . The terminal condition is given by the utility function,

[TABLE]

The boundary conditions $\bar{v}(t_{n},x_{1})$ and $\bar{v}(t_{n},x_{M})$ for $n\in[0,N-1]$ are given explicitly by the equation

[TABLE]

with

[TABLE]

Similarly, we can also implement the above method on a 2-asset example where $S_{t}\in\mathbb{R}^{2}$ and $\lambda_{0}F(\Sigma_{t})=\lambda_{0}\left\|\Sigma_{t}\right\|_{2}^{2}$ . The HJBI equation becomes

[TABLE]

We can solve for the optimal controls $\hat{\alpha}_{1},\hat{\alpha}_{2},\hat{\sigma}_{1},\hat{\sigma}_{2},\hat{\rho}$ in (28) using the first order condition. In this example, we always have the optimal $\hat{\sigma}_{1},\hat{\sigma}_{2}>0$ and $\hat{\rho}\in[-1,1]$ . Then, by applying Algorithm 1, we can get the value function of a portfolio with 2 risky assets.

Figure 7.4.1 shows the PDE estimated $\bar{v}(t,x)$ for the 1-asset example with parameters $r=0.015,\mu=0.035,\lambda_{0}=10$ ; Figure 7.4.1 shows result for the 2-asset case with parameters $r=0.015,\mu_{1}=0.035,\mu_{2}=0.045,\lambda_{0}=10$ . Comparing with the analytical solution, we can see that the two curves completely overlap for both 1-asset and 2-asset cases, which validates the accuracy of the PDE approach.

7.4.2 Power utility function

In the second example, we use a power utility function. This time, we only have the terminal condition and the boundary condition for $x_{1}=0$ , but not the boundary condition for a large $x_{M}$ . For functions $x^{\gamma}$ where $\gamma<1,\gamma\neq 0$ , the limit of the first order derivative approaches [math] as $x$ goes to infinity. Therefore we can use a zero Neumann boundary condition when $x_{M}$ is large. Then we have the following terminal and boundary conditions:

[TABLE]

Figure 4a shows the simulated value $\bar{v}(t,x)$ for a range of $x$ , with $U(X_{T})=\frac{4}{3}X_{T}^{\frac{1}{4}}$ and parameters $\mu=0.035,r=0.015,\lambda_{0}=10$ . We only display the estimated curve computed by our PDE method, as there is no analytical solution available for comparison in this example. Figure 4b shows the first four iterations of the estimated $\hat{a}$ from an initial guess. There is almost no difference between the four curves, indicating that the fixed point iteration scheme has converged within the first four iterations.

This subsection has shown that the PDE method converges to the true value efficiently. Nevertheless, there are a few shortcomings to this approach:

•

The PDE approach requires tedious algebraic manipulation before implementation. In particular, even when using the same utility function, the preliminary computations have to be redone if we switch to a different penalty function.

•

In general, PDE approaches suffer from the curse of dimensionality. As the dimension of the problem becomes higher, the computational complexity increases exponentially and the approach becomes infeasible. Although the PDE approach suffices for our current problem as the wealth process is only one-dimensional, it may not be feasible for other problems arising from multidimensional stochastic differential games.

For these two reasons, in the next subsection we develop a numerical scheme based on Monte Carlo simulations, which can be potentially useful for high-dimensional problems or in the case of complex penalty functions.

7.5 Monte Carlo method

In this section, we implement a Regression Monte Carlo scheme to solve the same robust portfolio allocation problems. Carriere (1996) introduced the Regression Monte Carlo approach to solve optimal stopping problems for any Markovian process in discrete time. In particular, he used non-parametric regression techniques. Later, Tsitsiklis and Van Roy (2001) and Longstaff and Schwartz (2001) used a similar scheme with ordinary least squares (a.k.a. Least Squares Monte Carlo) to value American options, respectively by value iteration and by performance iteration (see for example Denault and Simonato 2017). Since then, Regression Monte Carlo has become a popular tool in option pricing and more generally for solving discrete-time stochastic control problems in finite horizon.

First of all, we discretize the time interval $[0,T]$ into $N$ time steps with a constant step size $\Delta t=\frac{T}{N}$ . Using the Euler scheme on the logarithm of the state variable, one obtains the following dynamics for the discrete-time wealth $X_{n}$ :

[TABLE]

and the discretized form of our value is

[TABLE]

As we have proved in Section 5, this value function satisfies the DPP:

[TABLE]

7.5.1 Control randomization

Inspired by the Dynamic Programming Principle, we can start from the known terminal condition and compute the value functions backward in time recursively. Equation (31) involves a conditional expectation, which cannot be computed explicitly. Instead, one can for example use a least squares regression to approximate $\mathbb{E}\bigl{[}\bar{v}(n+1,X_{n+1})\bigl{|}\mathcal{F}_{n}\bigr{]}$ with a polynomial basis function. The obstacle in the implementation is that we are not able to simulate the paths $X_{n}$ forward, since the dynamics of the state variable depends on the uncertain controls. Following Kharroubi et al. (2014), one way to tackle this problem is an initial randomization of the controls, i.e., we choose an arbitrary initial distribution for the controls and simulate the $X_{n}$ with these dummy $\alpha_{n}$ and $\Sigma_{n}$ , before including these dummy controls in the regressors of the least-squares regressions.

Proofs of the convergence and error bounds for standard Regression Monte Carlo are available in Clément et al. (2002) and Beutner et al. (2013) for example. In the case of controlled dynamics, Kharroubi et al. (2015) analyzed the time-discretization error, and Kharroubi et al. (2014) investigated the projection error generated by approximating the conditional expectation by basis functions for the control randomization scheme. Recently, alternative randomization schemes have been proposed in the literature, such as Ludkovski and Maheshwari (2019), Balata and Palczewski (2018), Bachouch et al. (2018) or Shen and Weng (2019), which are more amenable to comprehensive convergence proofs, see Balata and Palczewski (2017) and Huré et al. (2018). Nevertheless, the classical control randomization scheme retains some advantages, such as the ease with which it can handle switching costs, as shown in Zhang et al. (2019).

For the choice of basis function $\phi$ , we can use a polynomial function in $X_{n},\alpha_{n},\Sigma_{n}$ , and let $\phi=\sum_{k=0}^{K}\beta_{k}\phi_{k}$ . Once we complete the regression, we can approximate the conditional expected value function $\mathbb{E}\bigl{[}\bar{v}(n+1,X_{n+1})\bigl{|}\mathcal{F}_{n}\bigr{]}$ in (31) by $\phi(\hat{\beta};X_{n},\alpha_{n},\Sigma_{n})$ . For the $m$ th simulation path, we can find the optimal controls by:

[TABLE]

The complete process is shown in Algorithm 2.

7.5.2 Logarithmic utility function

We first consider an example with 1 risky asset. When the utility function is logarithmic and the penalty function is $\lambda_{0}F(\sigma_{t}^{2})=\lambda_{0}(\sigma_{t}^{2})^{2}$ , we choose the following basis function

[TABLE]

To find the optimal controls, we differentiate $\lambda_{0}F(\sigma_{n}^{2})\Delta t+\sum_{k=0}^{K}\beta_{n+1}^{k}\phi_{k}(X_{n,}\alpha_{n},\sigma_{n})$ with respect to $\alpha_{n}$ and $\sigma_{n}^{2}$ , then we can get the optimal controls by solving the following polynomial equation

[TABLE]

With $\beta_{4}<0$ , there exists a real positive root. We can see the optimal controls are constants for each step, being independent of the state variable $X_{n}$ , this is the same as our observation in the analytical solution.

We used $M=5\times 10^{6}$ paths, $T=1$ and step size $\Delta t=\frac{1}{50}$ in the simulation, with the parameters $x_{0}=5,r=0.015,\lambda_{0}=10$ . Figure 6 shows the backward regression values, forward resimulation values and true values as we change the parameter $\mu$ . Figure 6 compares the forward resimulation values, finite difference results and true values as we change the parameter $\mu$ . It shows that both the PDE and Monte Carlo approach the true value in this example.

For the example with $2$ risky assets, we use the logarithmic utility function and the penalty function $F(\Sigma_{t})=\lambda_{0}\left\|\Sigma_{t}\right\|_{2}^{2}$ . We choose the following basis function in this case:

[TABLE]

where $\sigma_{n}^{1},\sigma_{n}^{2}$ are the volatilities of the two assets and $\rho_{n}$ is the correlation between the assets. We can differentiate $\lambda_{0}\left\|\Sigma_{t}\right\|_{2}^{2}\Delta t+\sum_{k=0}^{K}\beta_{n+1}^{k}\phi_{k}(X_{n},\alpha_{n}^{1},\alpha_{n}^{2},\sigma_{n}^{1},\sigma_{n}^{2},\rho_{n})$ to get the optimal controls. In practice, we always have $\hat{\sigma}_{n}^{1},\hat{\sigma}_{n}^{2}>0$ , but we need to truncate $\hat{\rho}_{n}$ to $[-1,1]$ . The optimal controls are also constants for each step as in the 1-asset case.

In the implementation, we use $M=4\times 10^{6}$ paths, $T=1$ and step size $\Delta t=\frac{1}{50}$ . The result is provided in Figure 8. This plot compares the backward regression values, forward resimulation values and the analytical values, and it shows how the values change w.r.t. the penalty strength $\lambda_{0}$ . From our observation, the average of the forward and backward results yields an even better estimate.

We can observe from Figure 6 and 8 that, as claimed in Kharroubi et al. (2014), the value function estimated at the end of the backward loop serves as an upper bound for the true value, while the one obtained from the forward resimulation serves as a lower bound and has a smaller error than the upper bound.

7.5.3 Power utility function

Here we show a 1-asset example with power utility. When the utility function is $U(X_{T})=\frac{4}{3}X_{T}^{\frac{1}{4}}$ and the penalty function $\lambda_{0}F(\sigma_{t}^{2})=\lambda_{0}(\sigma_{t}^{2})^{2}$ , we choose the basis function

[TABLE]

To find the optimal controls, we differentiate $\lambda_{0}F(\sigma_{n}^{2})\Delta t+\sum_{k=0}^{K}\beta_{n+1}^{k}\phi_{k}(X_{n,}\alpha_{n},\sigma_{n})$ and then get the polynomial equation (33) for each path. We can see the optimal controls $\hat{\alpha}_{n}$ and $\hat{\sigma}_{n}$ depend on $X_{n}$ in this case.

[TABLE]

Figure (8) shows Monte Carlo and finite difference approximations for a range of drifts $\mu$ , with $x_{0}=5,,r=0.015,\lambda_{0}=10$ , $M=5\times 10^{6}$ , $N=65$ . We can see that the PDE estimates lie within the Monte Carlo bounds and that the forward simulation values almost overlap the PDE estimations. Although we do not have the analytical solution for this power utility case, these plots suggest that we are able to estimate the true values accurately with both Control Randomization and Finite Difference.

In both the logarithmic and power utility cases, the forward resimulation always performs better than the backward loop estimates. That is because the forward resimulation only suffers from one source of error, the optimal control estimation, while the backward regression suffers more directly from regression error (see Kharroubi et al. 2014). So the forward simulation result is a better estimator of the true value and is the one we use for comparison with the analytical and PDE approaches.

From the results above, we can see that for these robust portfolio allocation problems with one single risky asset, both PDE and Monte Carlo methods provide accurate estimates, with the PDE estimates being slightly better overall. Both methods can be considered for solving robust portfolio allocation problems in practice. Some difficulties with the Monte Carlo approach are the choice of the basis and the number of Monte Carlo paths needed for a stable convergence. Still, the Monte Carlo would be the method of choice for more realistic portfolio allocation with multiple risky assets (see Zhang et al. 2019), as the PDE approach could quickly become computationally intractable in this situation.

7.6 Generative Adversarial Networks

In this section, we devise a GAN-based algorithm to solve the two-player zero-sum differential game.

Generative Adversarial Networks were introduced in Goodfellow et al. (2014). A GAN is a combination of two competing (deep) neural networks: a generator and a discriminator. The generator network tries to generate data that looks similar to the training data, and the discriminator network tries to tell the real data from the fake data. The idea behind GANs is very similar to the robust optimization problem studied in our paper: GANs can be interpreted as minimax games between the generator and the discriminator, whereas our problem is a minimax game between the agent who controls the portfolio allocation and the market who controls the covariance matrix. Inspired by this connection, we propose the following GAN-based algorithm.

Our GANs are composed of two neural networks; one generates $\alpha$ ( $\alpha$ -generator), the other generates $\sigma$ ( $\sigma$ -generator). The two networks have conflicting goals, the $\alpha$ -generator tries to maximize the expected utility, while the $\sigma$ -generator wants to minimize the expected utility. They compete against each other during the training. Because we have two networks with different objectives, it cannot be trained as a regular neural network. Each training iteration is divided into two phases: In the first phase, we train the $\alpha$ -generator, with the loss function $L_{1}=-\mathbb{E}\left[U(X_{T})+\lambda_{0}\int_{t}^{T}F(\sigma^{2}_{s})ds\right]$ . Then the back-propagation only optimizes the weights of the $\alpha$ -generator. In the second phase, given the output $\alpha$ from the $\alpha$ -generator, we train the $\sigma$ -generator with a loss function $L_{2}=\mathbb{E}\left[U(X_{T})+\lambda_{0}\int_{t}^{T}F(\sigma^{2}_{s})ds\right]$ . During this phase, the weights of the $\alpha$ -generator are frozen and the back-propagation only updates the weights of the $\sigma$ -generator. In a zero-sum game, the $\alpha$ -generator and $\sigma$ -generator constantly try to outsmart each other. As training advances, the game may end up at a Nash Equilibrium.

A demonstration of the simplified network architecture is illustrated in Figure 9. The blue part on the left of Figure 9 is the $\alpha$ -generator. For each time step $n$ , we construct a network ( $\mathcal{A}_{n}$ ), with the input $X_{n}$ and parameter $\sigma_{n}$ , the network generates output $\alpha_{n}$ . With the dynamics of wealth (29), we can continue this process until we get the terminal wealth $X_{N}$ . Once we get the output $\{\alpha_{n}\}_{n\in[1,N]}$ , we can use them as parameters for the $\sigma$ -generator (the green part in the figure). In the $\sigma$ -generator, similarly, we have one network ( $\mathcal{S}_{n}$ ) for each time step $n$ . With the input $X_{n}$ and parameter $\alpha_{n}$ , we can generate $\sigma_{n}$ . At the end of this phrase, the sequence $\{\sigma_{n}\}_{n\in[1,N]}$ will be fed into the $\alpha$ -generator as parameters as well. We have summarized this training process for 1-asset examples in Algorithm 3.

In the implementation, we choose the parameters $T=1,r=0.015,\mu=0.035$ . The training data has a sample size $M=200,000$ . We discretize the investment process into $N=65$ time steps. The deep neural network for each time step contains $4$ hidden layers, using Leaky ReLU as the activation function. For the $\sigma$ generator, to ensure the positivity of the output, we use Leaky Sigmoid as the activation function of the output layer. It is defined as $\text{LeakySigmoid}_{\beta}(z)=\frac{1}{1+e^{-x}}\mathbbm{1}(x\leq\beta)+\left[\frac{e^{-\beta}}{(1+e^{-\beta})^{2}}\times(x-\beta)+\frac{1}{1+e^{-\beta}}\right]\mathbbm{1}(x>\beta)$ . Its shape is similar to Sigmoid, but its range is $[0,+\infty]$ . We train the first $100$ epochs with a learning rate $5\times 10^{-4}$ , and then we train another $50$ epochs with a decreased learning rate $1\times 10^{-4}$ .

We now assess the quality of Algorithm 3. Firstly, we use a utility function $U(X_{T})=\ln(X_{T})$ and a cost function $\lambda_{0}F(\sigma_{t}^{2})=\lambda_{0}(\sigma_{t}-\sigma_{0})^{2}$ . Assuming the portfolio has an initial wealth $x_{0}=5$ , the analytical solution facilitates numerical comparison. Figures 10a compares the learned value functions with the true values for a range of $\lambda_{0}$ . It shows good accuracy of the learned functions versus the true ones. The errors are of magnitude $10^{-5}$ . The loss function $L_{2}$ during the training is presented in Figure 10b. Unlike the trend in training regular deep neural networks, the loss function is not monotonically decreasing. As we can see, the minimizer was dominating the competition at the beginning, the loss function decreasing rapidly. Then the maximizer caught up, the loss function increased for a while and finally converged to the true value.

In the second example, we use a utility function $U(X_{T})=3X_{T}^{\frac{1}{4}}$ and a cost function $F(\sigma_{t}^{2})=(\sigma_{t}^{2})^{2}$ . We set $\lambda_{0}=10$ in this case and estimate the value functions for a range of $x_{0}$ . Since we do not have access to the true values for power utility, we compare the GANs estimated values with the PDE estimations in 11a. The loss function $L_{1}$ for $x_{0}=6$ during the training is presented in Figure 11b.

Despite the promising results, a limitation of GANs, shared with deep neural networks in general, is the sensitivity of training to the chosen parameters. On difficult problems, fine-tuning the hyper-parameters of the GAN to facilitate training might require a lot of effort. One standard strategy for stabilizing training is to carefully design the model, either by adopting a proper architecture (Radford et al., 2015) or by selecting an easy-to-optimize objective function (Salimans et al., 2016). In spite of this caveat, GANs can be considered a viable contender to the more classical Monte Carlo methods of subsection 7.5 for robust portfolio allocation involving multiple risky assets, and deserve further investigation.

8 Conclusion

In this paper, we interpreted a robust portfolio optimization problem as a two-player zero-sum stochastic differential game. We have proven that the value function is the unique viscosity solution of a Hamilton–Jacobi–Bellman–Isaacs equation, and satisfies the Dynamic Programming Principle. We compared the performance of the robust and non-robust portfolios with both Monte Carlo simulation and empirical market data. Under market shocks, our robust mechanism can prevent huge losses. By choose the $\lambda_{0}$ properly, the robust portfolios have a higher expected utility than the non-robust one. In addition to the finite difference method, we provide control randomization and GANs algorithms to estimate the value function. These two methods can enrich quantitative techniques for solving robust portfolio optimization problems. Both of them have demonstrated high accuracy in the numerical results.

Acknowledgements

The Centre for Quantitative Finance and Investment Strategies has been supported by BNP Paribas. Ivan Guo has been partially supported by the Australian Research Council Discovery Project DP170101227.

Appendix A Appendices

A.1 Proof of Proposition 2

Proof.

First of all, define $w(t,x)\coloneqq\sup_{\alpha\in\mathcal{A}}\mathbb{E}^{t,x}\left[U(X_{T}^{\alpha,{\scriptscriptstyle\Sigma}})\right]$ , $(t,x)\in[0,T]\times\mathbb{R}$ . All the assumptions on $\alpha,U,X_{t}$ hold for $w(t,x)$ , except that we assume the covariance $\Sigma$ for time $u\in[t,T]$ is a fixed known process in $\mathcal{B}$ . An argument used in Pham (2009, p.52) proved that, when the utility function $U(\cdot)$ is continuous, increasing and concave on $\mathbb{R},$ $w(t,\cdot)$ is also increasing and concave in $x$ , $\forall t\in[0,T]$ .

For any fixed $\Sigma\in\mathcal{B}$ , we define a function $q(t,x)$ by

[TABLE]

Then $q(t,x)$ is also concave in $x$ for $t\in[0,T]$ . We define

[TABLE]

In addition to Assumption 2, we know $L$ is convex in $\Sigma_{t}$ and concave in $\alpha_{t}$ . By Zeidler (2013, Theorem 49.A), there exists a saddle point $(\alpha_{t}^{*},\Sigma_{t}^{*})\in A\times B$ , such that

[TABLE]

We know from Pham (2009, Chapter 4.3) that $q(t,x)$ is a viscosity solution of the HJB equation

[TABLE]

Then $q^{*}(t,x)\coloneqq\sup_{\alpha\in\mathcal{A}}\mathbb{E}^{t,x}\left[U(X_{T}^{\alpha,{\scriptscriptstyle\Sigma^{*}}})+\lambda_{0}\int_{t}^{T}F(\Sigma_{s}^{*})ds\right]$ is a viscosity solution of the PDE

[TABLE]

which is equivalent to

[TABLE]

due to the saddle point property (34). Using arguments similar to the ones in Pham (2009, Chapter 4), the function $\inf_{\Sigma\in\mathcal{B}}\mathbb{E}^{t,x}\left[U(X_{T}^{\alpha^{*},{\scriptscriptstyle\Sigma}})+\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds\right]$ is the unique viscosity solution of the HJB equation (35). Therefore we have

[TABLE]

With $J(t,x,\alpha,\Sigma)=\mathbb{E}^{t,x}\left[U(X_{T}^{\alpha,{\scriptscriptstyle\Sigma}})+\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds\right]$ , then the inequality

[TABLE]

implies

[TABLE]

From Proposition 1, we have $\underline{u}(t,x)\leq\underline{v}(t,x)\leq\bar{v}(t,x)\leq\bar{u}(t,x)$ . Combining this with $\bar{u}(t,x)=\underline{u}(t,x)$ , we obtained the required equalities

[TABLE]

∎

A.2 Proof of Proposition 3

Proof.

Let $X_{T}^{{\scriptscriptstyle\Sigma,\Gamma}}$ and $\bar{X}_{T}^{{\scriptscriptstyle\Sigma,\Gamma}}$ be the solutions of the SDE (2) with initial states $(t,x)$ and $(t,\bar{x})$ respectively, they are both controlled by an arbitrary pair of admissible control and strategy processes $(\Sigma,\Gamma)$ . From Assumption 1, we have

[TABLE]

We have

[TABLE]

By the Cauchy-Schwarz inequality,

[TABLE]

It is straightforward to check that there exist constants $C$ , $m_{1},m_{2}$ and $\beta_{0}$ such that

[TABLE]

By the classical inequality $\mathbb{E}^{t,x}\Bigl{[}\max_{t\leq s\leq T}\bigl{|}X_{s}^{{\scriptscriptstyle\Sigma,\Gamma}}\bigr{|}^{2m}\Bigr{]}\leq{\color[rgb]{0,0,1}{\color[rgb]{0,0,0}C_{T}}}(1+x^{2m})$ (e.g., Pham (2009, Theorem 1.3.15)), for arbitrary control and strategy processes $\Gamma,\Sigma$ , we have

[TABLE]

where $C_{T},m$ are constants, and $\Phi$ is a polynomial function.

Next, for all bounded functions $\mathbb{E}^{t,x}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds+U(X_{T}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}$ and $\mathbb{E}^{t,\bar{x}}\Bigl{[}\lambda_{0}\int_{t}^{T}F(\Sigma_{s})ds+U(\bar{X}_{T}^{{\scriptscriptstyle\Gamma,\Sigma}})\Bigr{]}$ ,

[TABLE]

Under Assumptions 1 and 3, $\bar{v}(t,x)$ is bounded. Then we can write the difference between the two value functions as:

[TABLE]

In addition to the inequality (40), the value function $\bar{v}(t,x)$ is locally Lipschitz continuous in $x$ . ∎

A.3 Proof of Theorem 1

Proof.

We use localization techniques here. Let $B_{k}=\{x\in\mathbb{R},x^{2}<k^{2}\}$ , let $\phi_{k}(x)$ be a function such that $\phi_{k}(x)=1$ on $B_{k}$ , and $\phi_{k}(x)=0$ outside $B_{k}$ . Then we can define a new process

[TABLE]

starting from an initial condition $(t,x)\in[0,T]\times\mathbb{R}$ . Let $U^{k}(x)=\phi_{k+2}(x)U(x)$ , then we can define the truncated value function by

[TABLE]

In the above setting, the drift and volatility functions in the SDE (45) are bounded, and the utility function in (46) is bounded and Lipschitz continuous. Since all assumptions of Fleming and Souganidis (1989) are satisfied, the localized value function $\bar{v}^{k}$ defined in (46) satisfies the dynamic programming principle: for $t\leq t+\theta\leq T$ ,

[TABLE]

In this proof, $X_{t+\theta}^{k,{\scriptscriptstyle\Gamma,\Sigma}}$ and $X_{t+\theta}^{{\scriptscriptstyle\Gamma,\Sigma}}$ are the solutions of SDE (45) and SDE (2) respectively, both starting from $(t,x)$ , controlled by processes $\Gamma,\Sigma$ for the time $u\in[t,t+\theta]$ .

As $k\rightarrow\infty$ , $\bar{v}^{k}(t,x)$ defined in (46) approaches $\bar{v}(t,x)$ defined in (7), then our problem reduces to proving that the right hand side of (47) converges to the right hand side of (11).

Note that if $X_{s}^{k}$ is in $\overline{B_{k+1}}$ , then $X_{u}^{k}$ is in $\overline{B_{k+1}}$ $\forall u\in[s,T]$ almost surely. Define $\tau_{k}$ to be the first exit time of $X_{t}^{k}$ from $B_{k}$ . Thus, for $(t,x)\in[0,T]\times\mathbb{R}$ , we have

[TABLE]

If $\tau_{k}>T$ , the term (48) is zero. For the term (49), for any arbitrary pair $(\bar{\Gamma},\bar{\Sigma})$ , we have

[TABLE]

Finally our task is to show that the upper bound (50) converges to zero as $k$ goes to infinity.

Let $X_{T}^{k,{\scriptscriptstyle\Gamma,\Sigma}}$ be the solution of SDE (45) starting from $(t+\theta,X_{t+\theta}^{k,{\scriptscriptstyle\bar{\Gamma},\bar{\Sigma}}})$ , and $X_{T}^{{\scriptscriptstyle\Gamma,\Sigma}}$ be the solution of (2) starting from $(t+\theta,X_{t+\theta}^{{\scriptscriptstyle\bar{\Gamma},\bar{\Sigma}}})$ , they are controlled by $\Gamma,\Sigma$ for the time $u\in[t+\theta,T]$ .

Using arguments in equations (41) and (42),

[TABLE]

For any arbitrary controls $(\Gamma,\Sigma)$ for the time $u\in[t+\theta,T]$ , it is easy to see that

[TABLE]

where $C,K_{T},C_{T},m_{1},m_{2},m$ are constants. Then there exists a polynomial $\Phi$ such that

[TABLE]

and the Markov inequality yields

[TABLE]

where $C_{T}$ is a constant independent of $k$ . Therefore we have

[TABLE]

where $K(\bigl{|}x\bigr{|})$ is a polynomial function in terms of $x$ .

As $k\rightarrow\infty$ , the term (49) goes to zero as well, therefore

[TABLE]

as the left and right hand sides of (47) converge to the left and right hand sides of equation (11) respectively. ∎

A.4 Proof of Corollary 1

Proof.

Let $X_{s}^{{\scriptscriptstyle\Gamma,\Sigma}}$ be the solution of the SDE (2) starting from $x$ at time $t$ , controlled by $\Gamma,\Sigma$ for time $u\in[t,s]$ . By the Dynamic Programming Principle and inequality (40), for $t<s<T,$

[TABLE]

With any arbitrary control and strategy processes $(\hat{\Sigma},\hat{\Gamma})$ for time $u\in[t,s]$ , we have

[TABLE]

Referring to (40), there exist a polynomial function $\Phi$ and constants $C,C_{T},m_{1},m_{2}$ for an arbitrary pair of $(\Gamma,\Sigma)$ for time $u\in[s,T]$ such that

[TABLE]

We know

[TABLE]

Let $\eta=\max\left\{\bigl{|}F(\Sigma_{u})\bigr{|}:\Sigma_{u}\in B\right\}$ , therefore

[TABLE]

Hence $\bar{v}(t,x)$ is Hölder continuous in $t\in[0,T]$ . ∎

A.5 Proof of Theorem 2

Proof.

We again make use of the localized processes $X_{t}^{k},U^{k}$ and $\bar{v}^{k}$ from the proof of Theorem 1 in Section 5. The HJBI equation associated with SDE (45) is

[TABLE]

where

[TABLE]

All the assumptions in Fleming and Souganidis (1989) are satisfied, so $\bar{v}^{k}(t,x)$ (46) is a viscosity solution of the HJBI equation (56).

Now we introduce another value function

[TABLE]

In the first case where $x\in B_{k+1}$ , we have $\left(X_{T}^{k}\right)^{2}<\left(k+2\right)^{2}$ almost surely. Therefore

[TABLE]

Then $\tilde{v}^{k}(t,x),\forall(t,x)\in[0,T]\times B_{k+1}$ is a viscosity solution of

[TABLE]

Since the drift and diffusion of $X_{t}^{k}$ are zero outside of $B_{k+1}$ , then $X_{T}^{k,{\scriptscriptstyle\Gamma,\Sigma}}=x$ for $x\in(\mathbb{R}\backslash B_{k+1})$ and

[TABLE]

It is easy to check that $\tilde{v}^{k}(t,x),\forall(t,x)\in[0,T]\times(\mathbb{R}\backslash B_{k+1})$ is also a viscosity solution of HJBI equation (58) with $\phi_{k}(x)=0$ . Combining the two cases, we have

[TABLE]

and $\tilde{v}^{k}(t,x)$ is a viscosity solution of (58).

Since $H^{k}$ $\rightarrow H$ as $k\rightarrow\infty$ , if we can prove $\tilde{v}^{k}\rightarrow\bar{v}$ as $k\rightarrow\infty$ , then it shows that $\bar{v}$ is a viscosity solution of equation (12). We will prove the convergence in the following way: first of all, we have

[TABLE]

For any arbitrary pair of control and strategy processes $(\Gamma,\Sigma)$ , we have

[TABLE]

Using Assumption 1, we can write

[TABLE]

Applying the Cauchy-Schwarz inequality on the upper bound (60), with similar arguments in (52), we obtain

[TABLE]

Hence

[TABLE]

where $\Phi(\left|x\right|)$ is a polynomial function independent of $k$ . Since $(\Gamma,\Sigma)$ are arbitrary, combining (59), (60) and (62), we deduce that

[TABLE]

So $\tilde{v}^{k}$ converges to $\bar{v}$ as $k\rightarrow\infty$ . Thus $\bar{v}$ is a viscosity solution of the HJBI equation (12). ∎

A.6 Explicit solution of equation (20)

For completeness, we express the real positive root of equation (20) explicitly.

Let $c=\dfrac{(\mu-r)^{2}}{2\lambda_{0}}$ , the discriminant of the equation $\Delta=-256c^{3}-27\sigma_{0}^{4}c^{2}$ is less than zero, meaning there are two distinct real roots. It is easy to check that there is one positive and one negative root, and the real positive one is

[TABLE]

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bachouch et al. (2018) Bachouch, A., C. Huré, N. Langrené, and H. Pham (2018). Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications. ar Xiv preprint ar Xiv:1812.05916 .
2Balata and Palczewski (2017) Balata, A. and J. Palczewski (2017). Regress-later Monte Carlo for optimal control of Markov processes. ar Xiv preprint ar Xiv:1712.09705 .
3Balata and Palczewski (2018) Balata, A. and J. Palczewski (2018). Regress-later Monte Carlo for optimal inventory control with applications in energy. ar Xiv preprint ar Xiv:1703.06461 .
4Baltas et al. (2019) Baltas, I., A. Xepapadeas, and A. N. Yannacopoulos (2019). Robust control of parabolic stochastic partial differential equations under model uncertainty. European Journal of Control 46 , 1–13.
5Bel Hadj Ayed et al. (2017) Bel Hadj Ayed, A., G. Loeper, and F. Abergel (2017). Forecasting trends with asset prices. Quantitative Finance 17 (3), 369–382.
6Ben-Tal and Nemirovski (1998) Ben-Tal, A. and A. Nemirovski (1998). Robust convex optimization. Mathematics of operations research 23 (4), 769–805.
7Beutner et al. (2013) Beutner, E., A. Pelsser, and J. Schweizer (2013). Fast convergence of regress-later estimates in least squares Monte Carlo.
8Black and Scholes (1973) Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. Journal of Political Economy 81 (3), 637–654.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Robust utility maximization under model uncertainty via a penalization

Abstract

1 Introduction

2 Problem formulation

Definition 1**.**

Definition 2**.**

2.1 Robust value functions

2.2 Assumptions

Assumption 1**.**

Assumption 2**.**

Assumption 3**.**

3 Value functions of two-player zero-sum stochastic

Definition 3**.**

4 Existence of a value for the differential games

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

5 Dynamic programming principle

Proposition 3**.**

Proof.

Theorem 1** (Dynamic Programming Principle).**

Proof.

Corollary 1**.**

Proof.

6 Viscosity solution of the HJBI equation

6.1 Existence of a viscosity solution of the HJBI Equation

Theorem 2**.**

Proof.

6.2 Comparison principle for the HJBI Equation

Theorem 3**.**

Corollary 2**.**

Proof.

7 Numerical results

7.1 Analytical solution

7.2 Comparison of robust and non-robust portfolios with Monte Carlo simulation

7.3 Comparison of robust and non-robust portfolios with empirical market data

7.4 Implicit finite difference method

7.4.1 Logarithmic utility function

7.4.2 Power utility function

7.5 Monte Carlo method

7.5.1 Control randomization

7.5.2 Logarithmic utility function

7.5.3 Power utility function

7.6 Generative Adversarial Networks

8 Conclusion

Acknowledgements

Appendix A Appendices

A.1 Proof of Proposition 2

Proof.

A.2 Proof of Proposition 3

Proof.

A.3 Proof of Theorem 1

Proof.

A.4 Proof of Corollary 1

Proof.

A.5 Proof of Theorem 2

Proof.

A.6 Explicit solution of equation (20)

Definition 1.

Definition 2.

Assumption 1.

Assumption 2.

Assumption 3.

Definition 3.

Proposition 1.

Proposition 2.

Proposition 3.

Theorem 1 (Dynamic Programming Principle).

Corollary 1.

Theorem 2.

Theorem 3.

Corollary 2.