Understanding Distributional Ambiguity via Non-robust Chance Constraint

Qi Wu; Shumin Ma; Cheuk Hang Leung; Wei Liu; Nanbo Peng

arXiv:1906.01981·math.OC·September 22, 2020

Understanding Distributional Ambiguity via Non-robust Chance Constraint

Qi Wu, Shumin Ma, Cheuk Hang Leung, Wei Liu, Nanbo Peng

PDF

Open Access

TL;DR

This paper links distributionally robust optimization to chance constraints, providing a business-oriented interpretation of ambiguity sets and demonstrating advantages over traditional DRO in heavy-tailed scenarios.

Contribution

It introduces a non-robust interpretation of DRO via chance constraints and extends the analysis to heavy-tailed distributions, enhancing practical understanding and application.

Findings

01

DRO is asymptotically equivalent to mean-deviation problems.

02

Adding boundedness constraints improves chance-constrained formulations.

03

Heavy-tail distributions like Student t and lognormal are effectively handled.

Abstract

This paper provides a non-robust interpretation of the distributionally robust optimization (DRO) problem by relating the distributional uncertainties to the chance probabilities. Our analysis allows a decision-maker to interpret the size of the ambiguity set, which is often lack of business meaning, through the chance parameters constraining the objective function. We first show that, for general $ϕ$ -divergences, a DRO problem is asymptotically equivalent to a class of mean-deviation problems. These mean-deviation problems are not subject to uncertain distributions, and the ambiguity radius in the original DRO problem now plays the role of controlling the risk preference of the decision-maker. We then demonstrate that a DRO problem can be cast as a chance-constrained optimization (CCO) problem when a boundedness constraint is added to the decision variables. Without the boundedness…

Tables8

Table 1. Table 1 . The two ϕ italic-ϕ \phi -divergences used in this paper. The KL divergence applies to light-tail distributions, while the Cressie-Read divergence is compatible with heavy-tail distributions.

	Kullback-Leibler	Cressie-Read
$ϕ (t)$	$t \log (t) - t + 1$	$\frac{1 - θ + θ t - t^{θ}}{θ (1 - θ)}, θ \neq 0, 1$
$ϕ^{*} (s)$	$e^{s} - 1$	$\frac{{(1 - s (1 - θ))}^{\frac{θ}{θ - 1}}}{θ} - \frac{1}{θ}, s < \frac{1}{1 - θ}$

Table 2. Table 2 . Relative errors of the 4 t h superscript 4 𝑡 ℎ 4^{th} order reformulation and 2 n d superscript 2 𝑛 𝑑 2^{nd} order reformulation w.r.t. the optimal value of problem ( 3 ). Ambiguity sets are defined by KL divergence centered at a 6 6 6 - d 𝑑 d exponential distribution.

		Relative errors
	Optimal value	$4^{t h}$ order	$2^{n d}$ order
$ρ =$ 0.01	0.1887	0.0002 $%$	0.1172 $%$
0.02	0.1841	0.0038 $%$	0.2397 $%$
0.03	0.1807	0.0128 $%$	0.3659 $%$
0.04	0.1778	0.0274 $%$	0.4951 $%$
0.05	0.1753	0.0479 $%$	0.6270 $%$
0.06	0.1730	0.0748 $%$	0.7613 $%$
0.07	0.1710	0.1082 $%$	0.8979 $%$
0.08	0.1691	0.1483 $%$	1.037 $%$
0.09	0.1673	0.1951 $%$	1.778 $%$

Table 3. Table 3 . The equivalent ambiguity radius ρ 𝜌 \rho of the DRO problem for the four asset classes: ( 3(a) ) Equity, ( 3(b) ) US Treasury, ( 3(c) ) Currency, and ( 3(d) ) Commodity. The loss threshold δ 𝛿 \delta is taken as the negative value of the 3 % percent 3 3\% empirical quantile of the daily simple return series for each asset class, thus is different across assets. We compare the portfolio performance within each asset class based on the choice of the center distribution (either multivariate student t 𝑡 t - or normal distributions) under both the 4 t h superscript 4 𝑡 ℎ 4^{th} order and 2 n d superscript 2 𝑛 𝑑 2^{nd} order reformulations of the DRO problem. The percentage number in the round brackets under the equivalent ambiguity radius ρ 𝜌 \rho records the corresponding optimal portfolio annualized return. Bold numbers emphasize the better portfolio return performance at a given pair of ( ϵ , δ ) italic-ϵ 𝛿 (\epsilon,\delta) under a given solution framework of the DRO problem.

	$4^{t h}$ order		$2^{n d}$ order
	Student $t$	Normal	Student $t$	Normal
$ϵ$ = 2 $%$	3.5e-4	1.2e-4	6.1e-4	1.2e-4
	(30.7 $%$ )	(15.3 $%$ )	(30.7 $%$ )	(15.3 $%$ )
$ϵ$ = 5 $%$	3.4e-4	1.2e-4	6.1e-4	1.2e-4
	(39.2 $%$ )	(19.8 $%$ )	(39.2 $%$ )	(19.8 $%$ )

Table 4. (a) Equity: δ 𝛿 \delta = 3.35 % percent \% .

	$4^{t h}$ order		$2^{n d}$ order
	Student $t$	Normal	Student $t$	Normal
$ϵ$ = 2 $%$	3.5e-4	1.2e-4	6.1e-4	1.2e-4
	(30.7 $%$ )	(15.3 $%$ )	(30.7 $%$ )	(15.3 $%$ )
$ϵ$ = 5 $%$	3.4e-4	1.2e-4	6.1e-4	1.2e-4
	(39.2 $%$ )	(19.8 $%$ )	(39.2 $%$ )	(19.8 $%$ )

Table 5. (b) US Treasury: δ 𝛿 \delta = 6.58 % percent \% .

	$4^{t h}$ order		$2^{n d}$ order
	Student $t$	Normal	Student $t$	Normal
$ϵ$ = 2 $%$	2e-6	2.8e-14	9.5e-6	2.8e-14
	(-1.1 $%$ )	(-2.6 $%$ )	(-1.1 $%$ )	(-2.6 $%$ )
$ϵ$ = 5 $%$	2e-6	2.8e-14	4.8e-6	2.8e-14
	( $0.7 %$ )	(-2.6 $%$ )	(0.7 $%$ )	(-2.6 $%$ )

Table 6. (c) Currency: δ 𝛿 \delta = 1.40 % percent \% .

	$4^{t h}$ order		$2^{n d}$ order
	Student $t$	Normal	Student $t$	Normal
$ϵ$ = 2 $%$	2.6e-4	6.1e-5	3.1e-4	6.1e-5
	(2.3 $%$ )	(3.6 $%$ )	(2.3 $%$ )	(3.6 $%$ )
$ϵ$ = 5 $%$	1.5e-4	3.1e-5	3.1e-4	3.1e-5
	(4.4 $%$ )	(5.0 $%$ )	(4.4 $%$ )	(5.0 $%$ )

Table 7. (d) Commodity: δ 𝛿 \delta = 4.4 % percent \% .

	$4^{t h}$ order		$2^{n d}$ order
	Student $t$	Normal	Student $t$	Normal
$ϵ$ = 2 $%$	9.6e-5	3.7e-9	1.5e-4	3.7e-9
	(17.3 $%$ )	(4.6 $%$ )	(17.3 $%$ )	(4.6 $%$ )
$ϵ$ = 5 $%$	6.5e-5	1.9e-9	7.6e-4	1.9e-9
	(22.7 $%$ )	(4.6 $%$ )	(22.6 $%$ )	(4.6 $%$ )

Table 8. Table 4 . Statistics of the 3 return series constructed by 5-minute/hourly/half-day/daily rebalanced allocation strategies solved by the DRO problem, the nominal problem, and the Mean Variance problem, respectively.

	DRO	Nominal	Mean Variance
5-minute rebalancing
Mean (e-4)	3.68	3.24	0.77
Variance(e-6)	56.5	199	3.48
Skewness	0.66	0.17	0.61
Hourly rebalancing
Mean (e-4)	3.19	2.67	0.59
Variance(e-6)	56.3	200	3.52
Skewness	0.64	0.13	0.53
Half-day rebalancing
Mean (e-4)	2.7	2.3	0.48
Variance(e-6)	55.9	198	3.47
Skewness	0.62	0.09	0.43
Daily rebalancing
Mean (e-4)	1.69	1.0	0.24
Variance(e-6)	55.6	190	3.43
Skewness	0.66	0.054	0.52

Equations53

D (Q ∣∣ P) := \int ϕ (\frac{d Q}{d P}) d P = E_{P} [ϕ (\frac{d Q}{d P})] := E_{P} [ϕ (L)] .

D (Q ∣∣ P) := \int ϕ (\frac{d Q}{d P}) d P = E_{P} [ϕ (\frac{d Q}{d P})] := E_{P} [ϕ (L)] .

x \in X max E_{P} [f (x, r)] .

x \in X max E_{P} [f (x, r)] .

x \in X max P \in U min E_{P} [f (x, r)] .

x \in X max P \in U min E_{P} [f (x, r)] .

x \in X max E_{P_{0}} [f (x, r)] s . t . P r_{\sim P_{0}} (x^{T} r \leq - δ) \leq ϵ .

x \in X max E_{P_{0}} [f (x, r)] s . t . P r_{\sim P_{0}} (x^{T} r \leq - δ) \leq ϵ .

P \in U min E_{P} [f (x, r)] .

P \in U min E_{P} [f (x, r)] .

η_{1} \in R, η_{2} \geq 0 sup {- \frac{1}{η _{2}} L sup {E_{P_{0}} [- η_{2} (f (x, r) + η_{1}) L - ϕ (L)]} - η_{1} - \frac{ρ}{η _{2}}}

η_{1} \in R, η_{2} \geq 0 sup {- \frac{1}{η _{2}} L sup {E_{P_{0}} [- η_{2} (f (x, r) + η_{1}) L - ϕ (L)]} - η_{1} - \frac{ρ}{η _{2}}}

= η_{1} \in R, η_{2} \geq 0 sup {- \frac{1}{η _{2}} E_{P_{0}} [ϕ^{*} (- η_{2} (f (x, r) + η_{1}))] - η_{1} - \frac{ρ}{η _{2}}} .

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

:=

E_{P_{0}} [f (x, r)] - η_{2} \geq 0 in f {\frac{ρ}{η _{2}} + D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])} .

E_{P_{0}} [f (x, r)] - η_{2} \geq 0 in f {\frac{ρ}{η _{2}} + D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])} .

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

= k = 1 \sum K - 1 b_{k} E_{P_{0}} [(X + η_{1}^{*})^{k + 1}] η_{2}^{k} + o (η_{2}^{K - 1}),

η_{1} min k = 1 \sum K - 1 b_{k} E_{P_{0}} [(X + η_{1})^{k + 1}] η_{2}^{k} .

η_{1} min k = 1 \sum K - 1 b_{k} E_{P_{0}} [(X + η_{1})^{k + 1}] η_{2}^{k} .

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

D_{η_{2}, ϕ, P_{0}} (f (x, r) ∣ E_{P_{0}} [f (x, r)])

= k = 1 \sum 3 b_{k} E_{P_{0}} [(X + η_{1}^{*})^{k + 1}] η_{2}^{k} + o (η_{2}^{3}),

k = 1 \sum 3

k = 1 \sum 3

+ 4 b_{3} η_{2}^{3} E_{P_{0}} [X^{3}] + 3 b_{2} η_{2}^{2} E_{P_{0}} [X^{2}] = 0.

P \in U min E_{P} [f (x, r)]

P \in U min E_{P} [f (x, r)]

\approx E_{P_{0}} [f (x, r)] - η_{2} \geq 0 in f {\frac{ρ}{η _{2}} + \frac{η _{2}}{2 ϕ ^{(2)} ( 1 )} V_{P_{0}} [f (x, r)]}

= E_{P_{0}} [f (x, r)] - \frac{2 ρ}{ϕ ^{(2)} ( 1 )} V_{P_{0}} [f (x, r)] .

η_{2} \geq 0 in f {\frac{ρ}{η _{2}} + \frac{η _{2}}{2 ϕ ^{(2)} ( 1 )} V_{P_{0}} [f (x, r)]} = \frac{2 ρ V _{P_{0}} [ f ( x , r )]}{ϕ ^{(2)} ( 1 )},

η_{2} \geq 0 in f {\frac{ρ}{η _{2}} + \frac{η _{2}}{2 ϕ ^{(2)} ( 1 )} V_{P_{0}} [f (x, r)]} = \frac{2 ρ V _{P_{0}} [ f ( x , r )]}{ϕ ^{(2)} ( 1 )},

x \in X max {E_{P_{0}} [f (x, r)] - \frac{2 ρ V _{P_{0}} [ f ( x , r )]}{ϕ ^{(2)} ( 1 )}} .

x \in X max {E_{P_{0}} [f (x, r)] - \frac{2 ρ V _{P_{0}} [ f ( x , r )]}{ϕ ^{(2)} ( 1 )}} .

V_{ϵ} (x) := in f {γ \in R : P r_{\sim P_{0}} {- x^{T} r \geq γ} \leq ϵ} .

V_{ϵ} (x) := in f {γ \in R : P r_{\sim P_{0}} {- x^{T} r \geq γ} \leq ϵ} .

P r_{\sim P_{0}} {- x^{T} r \geq δ} \leq ϵ \Leftrightarrow V_{ϵ} (x) \leq δ .

P r_{\sim P_{0}} {- x^{T} r \geq δ} \leq ϵ \Leftrightarrow V_{ϵ} (x) \leq δ .

x \in X max x^{T} μ s . t . V_{ϵ} (x) \leq δ .

x \in X max x^{T} μ s . t . V_{ϵ} (x) \leq δ .

V_{ϵ} (x) = κ (ϵ) x^{T} Σ x - x^{T} μ,

V_{ϵ} (x) = κ (ϵ) x^{T} Σ x - x^{T} μ,

x \in X max E_{P_{0}} [f (x, r)] s . t . κ (ϵ) x^{T} Σ x - x^{T} μ \leq δ .

x \in X max E_{P_{0}} [f (x, r)] s . t . κ (ϵ) x^{T} Σ x - x^{T} μ \leq δ .

x^{*} = \frac{Σ ^{- 1} ( μ - λ ^{*} e )}{B - λ ^{*} A}, v^{*} = λ^{*},

x^{*} = \frac{Σ ^{- 1} ( μ - λ ^{*} e )}{B - λ ^{*} A}, v^{*} = λ^{*},

\tilde{x}^{*} = \frac{Σ ^{- 1} [( 1 + λ ~ ) μ - θ ~ e ]}{( 1 + λ ~ ) B - θ ~ A}, \tilde{v}^{*} = \tilde{λ} δ + \tilde{θ},

\tilde{x}^{*} = \frac{Σ ^{- 1} [( 1 + λ ~ ) μ - θ ~ e ]}{( 1 + λ ~ ) B - θ ~ A}, \tilde{v}^{*} = \tilde{λ} δ + \tilde{θ},

x \in X min x^{T} Σ x s . t . x^{T} μ \geq r_{t a r g e t} .

x \in X min x^{T} Σ x s . t . x^{T} μ \geq r_{t a r g e t} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Fuzzy Systems and Optimization · Multi-Criteria Decision Making

Full text

Understanding Distributional Ambiguity

via Non-robust Chance Constraint

Shumin Ma

City University of Hong Kong

[email protected]

,

Cheuk Hang Leung

City University of Hong Kong

[email protected]

,

Qi Wu

City University of Hong Kong

[email protected]

,

Wei Liu

Tencent

[email protected]

and

Nanbo Peng

JD Digits

[email protected]

(2020)

Abstract.

This paper provides a non-robust interpretation of the distributionally robust optimization (DRO) problem by relating the distributional uncertainties to the chance probabilities. Our analysis allows a decision-maker to interpret the size of the ambiguity set, which is often lack of business meaning, through the chance parameters constraining the objective function. We first show that, for general $\phi$ -divergences, a DRO problem is asymptotically equivalent to a class of mean-deviation problems. These mean-deviation problems are not subject to uncertain distributions, and the ambiguity radius in the original DRO problem now plays the role of controlling the risk preference of the decision-maker. We then demonstrate that a DRO problem can be cast as a chance-constrained optimization (CCO) problem when a boundedness constraint is added to the decision variables. Without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution. Thanks to our high-order expansion result, a notable feature of our analysis is that it applies to divergence measures that accommodate well heavy tail distributions such as the student $t$ -distribution and the lognormal distribution, besides the widely-used Kullback-Leibler (KL) divergence, which requires the distribution of the objective function to be exponentially bounded. Using the portfolio selection problem as an example, our comprehensive testings on multivariate heavy-tail datasets, both synthetic and real-world, shows that this business-interpretation approach is indeed useful and insightful.

Distributionally robust optimization, Chance constraint, KL divergence, $\phi$ -divergence, Heavy-tail distributions, Portfolio selection

††journalyear: 2020††copyright: acmlicensed††conference: ACM International Conference on AI in Finance; October 15–16, 2020; New York, NY, USA††booktitle: ACM International Conference on AI in Finance (ICAIF ’20), October 15–16, 2020, New York, NY, USA††price: 15.00††doi: 10.1145/3383455.3422522††isbn: 978-1-4503-7584-9/20/10††submissionid: 18††ccs: Applied computing Multi-criterion optimization and decision-making

1. Introduction

Stochastic optimization is widely used in many machine learning algorithms to optimize the expected performance or loss, e.g., the mean squared error for regressions, or the expected discounted return in the context of reinforcement learning (Thomas and Learned-Miller, 2019). A sound machine learning model demands reliable estimates of the data-generating distribution. However, uncertainties of the data distribution could arise in many ways: limited observations in the stationary case, time-varying law in the non-stationary case, or the law is subject to policy intervention due to the treatment effect. In robust statistics, formulating a decision-making problem as a DRO problem is a remedy to address the distributional uncertainties in the data (Chen and Paschalidis, 2018).

A typical DRO formulation adds an extra layer of optimization over a set of possible distributions, called the ambiguity set, and optimizes the decision variables in the worst-case distribution. There are mainly three ways in the literature to define the ambiguity set. The first is the geometric approach, which allows the parameters of the chosen distribution to vary within certain geometric regions (Kim et al., 2014; Zhu et al., 2009; Zhu and Fukushima, 2009) such as boxes, ellipsoids, and polyhedrons, etc. The second approach, known as the moment-based approach, constructs the ambiguity set by collecting distributions that share the same moment constraints (Delage and Ye, 2010; Scarf, 1957; Chen et al., 2011; Zymler et al., 2013). The last one, the statistical distance approach, uses divergence measures or difference functions between two probability distributions to define the ambiguity set as a ball of distributions (Namkoong and Duchi, 2017; Abadeh et al., 2015; Esfahani and Kuhn, 2018; Chen and Paschalidis, 2018). The radius of the ball is referred to as the ambiguity radius. Among the three, the moment-based and the statistical distance approaches address law uncertainties. In contrast, the geometric approach only addresses the uncertainties in the parameters of a a prior fixed distribution, not in its functional form. It does not help if the correct distribution turns out to be lognormal when you think it is instead normal and fine-tune its mean and variance. However, the cost of advancing from parameter uncertainty to law uncertainty is that you lose the interpretability of the ambiguity set because the parameters characterizing it are non-business quantities.

This paper provides a solution to address this business-interpretation problem. For business applications, a decision-maker would have a hard time relating, e.g., the radius $0.01$ of a KL ball to e.g., product sales, taxi demands, or portfolio returns. The radius $0.01$ is not related to any measures of the business objective. An unavoidable headache for her is how she should decide the size of the ambiguity set. Our idea is straightforward. We want to translate the impact of the ambiguity radius, which lacks business meanings, to the impact of the chance parameters constraining the objective function, which now allows a decision-maker to enter her preferences directly related to the business objective. Take asset allocation as an example, our solution can tell a portfolio manager that setting the ambiguity radius to $0.01$ would be equivalent to asking the optimization not to let the chances of her portfolio return going below $-13\%$ be higher than $2\%$ . In this way, the geometry of the ambiguity set, its radius, is connected directly to her granular preference of the objective, the amount of risks she can tolerate.

This paper makes two primary technical contributions. First, our analysis applies to the heavy-tail distributions (e.g., via the Cressie-Read divergence) (Glasserman and Xu, 2014), besides the usual light-tail cases using the KL divergence. Heavy-tail distributions, e.g., the lognormal distribution and the student $t$ -distribution, are ubiquitous for many business and finance datasets. A DRO problem with an ambiguity set defined by the KL divergence is solvable, however, only when the distribution of the objective function is exponentially bounded (Hu and Hong, 2013), in which case heavy-tail distributions are excluded. Our analysis extends well to the general $\phi$ -divergence family, including KL divergence, Burg entropy, $\chi^{2}$ -distance, Hellinger distance, Cressie-Read divergence, etc. (Namkoong and Duchi, 2017; Hashimoto et al., 2018). The second contribution of this paper is that we establish two connections between a DRO problem and a CCO problem. The first one is that when a bounded constraint is added to decision variables, a DRO problem can be cast as a CCO problem without distributional uncertainties. The second connection is that, without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution.

The rest of the paper is organized as follows. In Section 2, we provide some background information and the motivation for the proposed optimization problems. Theoretical analysis of the DRO problem and the CCO problem is provided in Section 3. Section 4 establishes the connection between the DRO problem and the CCO problem under an explicit formulation of the portfolio selection problem. Section 5 gives numerical experiments, and Section 6 concludes our findings from both synthetic and empirical data. Due to the page limits, all the proofs are omitted in the main body; however, they can be readily provided once requested.

2. Problem setup

2.1. Notations

Let $\mathbf{r}\in\mathbb{R}^{n}$ , an $n$ -dimensional real-valued random vector, be the vector of asset returns. And suppose the joint probability distribution of $\mathbf{r}$ is $\mathbb{P}$ . Let $\mathbb{P}_{0}$ be the nominal probability distribution of $\mathbf{r}$ . Let $\mathbf{x}\in\mathbb{R}^{n}$ be the asset allocation strategy, and $\mathbf{e}\in\mathbb{R}^{n}$ be a vector with all entries equal to $1$ . Denote the utility function that is concave in $\mathbf{x}$ and associated with $\mathbf{x}$ and $\mathbf{r}$ by $f(\mathbf{x},\mathbf{r})$ . We assume that $\mathbf{x}$ lies in a convex set $\mathcal{X}$ and $\mathbb{P}$ belongs to an ambiguity set $\mathcal{U}$ . The expectation and variance of a random variable under $\mathbb{P}$ are represented by $\mathbb{E}_{\mathbb{P}}[\cdot]$ and $\mathbb{V}_{\mathbb{P}}[\cdot]$ , respectively.

Definition 2.1.

( $\phi$ -divergence) Assume that $\phi(t)$ is convex for $t\geq 0$ and that $\phi(1)=0$ . Then the $\phi$ -divergence $D(\mathbb{Q}||\mathbb{P})$ between distribution $\mathbb{P}$ and distribution $\mathbb{Q}$ is defined as:

[TABLE]

The quantity $L$ in Eq. (1) is called the Radon Nikodym derivative (or likelihood ratio) such that $L\geq 0$ almost surely and $\mathbb{E}_{\mathbb{P}}\left[L\right]=1$ . Notice that, for the Radon-Nikodymm derivative $L$ to exist, $\mathbb{Q}$ must be absolutely continuous w.r.t. $\mathbb{P}$ . Given the function $\phi$ for a specific $\phi$ -divergence, its conjugate $\phi^{*}$ is defined as $\phi^{*}(s):=\sup_{t\geq 0}\{st-\phi(t)\}$ . Table 1 lists the two divergences used in this paper. But it should be mentioned that, our interpretation of the ambiguity radius actually applies to all the $\phi$ -divergences, including Burg entropy, $J$ -divergence, $\chi^{2}$ -distance, modified $\chi^{2}$ -distance, and Hellinger distance. (For more information about the $\phi$ -divergence family, see (Ben-Tal et al., 2013)).

2.2. Motivation

The goal is to maximize the expected utility over a set of admissible allocation strategies $\mathcal{X}$ , namely,

[TABLE]

We introduce the ambiguity set $\mathcal{U}$ centered at the nominal distribution $\mathbb{P}_{0}$ (also called the center distribution in the following context) and controlled by the radius parameter $\rho>0$ , that is, $\mathcal{U}:=\{\mathbb{P}:D(\mathbb{P}||\mathbb{P}_{0})\leq\rho\}$ . Thus, the distributionally robust counterpart of problem (2) is:

[TABLE]

For a decision-maker, the ambiguity radius $\rho$ is critical. One cannot set it too large since the optimal utility decreases in $\rho$ . However, if it is too small, one loses the robust protection. There is a trade-off in choosing its magnitude in the financial context. In literature, (Pardo, 2005) presents the characteristics of the $\phi$ -divergence between the true distribution $\mathbb{P}$ and the nominal distribution $\mathbb{P}_{0}$ , $D(\mathbb{P}||\mathbb{P}_{0})$ . Assuming that $\mathbb{P}$ and $\mathbb{P}_{0}$ belong to the same parameterized distribution family with parameter dimension $d$ , and that $\phi$ is twice continuously differentiable in a neighborhood of 1 with $\phi^{(2)}(1)>0$ , the normalized estimated $\phi$ -divergence $\frac{2N}{\phi^{(2)}(1)}D(\mathbb{P}||\mathbb{P}_{0})$ asymptotically (i.e., for the sample size $N\rightarrow\infty$ ) follows a $\chi^{2}_{d}$ -distribution. This conclusion thus relates the ambiguity radius $\rho$ to a confidence level at which the true distribution $\mathbb{P}$ falls within the ambiguity set. (Blanchet et al., 2018) provides one methodology, under the Markowitz’s mean-variance portfolio selection framework, to select the ambiguity radius $\rho$ as the smallest radius such that the true asset allocation strategy is included with a given confidence level.

However, in financial practice with real data, the assumption that the true distribution is in the same parameterized family with the center distribution is too strong. A wrong guess of the nominal distribution may lead to a meaningless confidence level interpretation of the ambiguity radius $\rho$ . Since the DRO approach is believed to provide robust protection against distributional uncertainty, we are motivated to connect the robust protection to protection provided by traditional risk measures. In particular, the heavy-tail nature of distributions that we are concerned with reminds us of the tail probability protection, the optimization based on which is known as CCO problems. Specifically, we define the CCO problem as:

[TABLE]

Here, $\delta>0$ characterizes a typical investor’s loss threshold and $\epsilon>0$ corresponds to the loss probability. The CCO problem in problem (4) shares the same objective function as that of problem (2). The expectation is taken under the nominal distribution $\mathbb{P}_{0}$ , not subject to any distributional robustness (the term ”non-robust” in the title originates from here). Compared to problem (2), the new component is the chance constraint with parameters ( $\delta$ , $\epsilon$ ) characterizing an investor’s tolerance to losses.

We would build up a performance-based interpretation of the ambiguity radius $\rho$ through the parameters of the chance-constrained problem. To be specific, if under some ambiguity radius $\rho$ and chance constraint parameters ( $\delta$ , $\epsilon$ ), problem (3) and problem (4) achieve the same optimal value, we would say that the robust protection under the ambiguity radius $\rho$ is similar to that of a tail probability protection. In addition, we would also look into how the choice of the allocation strategy set $\mathcal{X}$ and the tail heaviness of the nominal distribution $\mathbb{P}_{0}$ affect the interpretation of the ambiguity radius $\rho$ , given that $\mathcal{X}$ and $\mathbb{P}_{0}$ are the shared model settings of the two problems (3) and (4).

3. Analysis of DRO and CCO problems

This section is devoted to the theoretical analysis of problems (3) and (4). We show that, for general $\phi$ -divergences, problem (3) can be reformulated as a class of mean-deviation problems with the investor’s risk preference parameter controlled by the ambiguity radius $\rho$ . Besides, we provide an approximation framework to solve problem (4).

3.1. Reformulation of the DRO problem (3)

Consider the inner optimization problem in problem (3):

[TABLE]

The Lagrangian dual to problem (5) is:

[TABLE]

The last equality is derived directly from the definition of the conjugate function of $\phi$ -divergence. Difficulty in solving the dual problem lies in the term $\mathbb{E}_{\mathbb{P}_{0}}[\phi^{*}(-\eta_{2}(f(\mathbf{x},\mathbf{r})+\eta_{1}))]$ . We hereby follow the idea in (Gotoh et al., 2018) to express optimization (5) in terms of Regular Measure of Deviation, with results summarized in Theorem 3.1.

Theorem 3.1.

Let $\phi$ be a closed proper convex function and $\phi^{*}$ be its corresponding conjugate function, respectively. Suppose that under mild conditions, the strong duality holds. Define the regular measure of deviation

[TABLE]

Then, optimization (5) is equivalent to :

[TABLE]

Furthermore, the quantity $\mathcal{D}_{\eta_{2},\phi,\mathbb{P}_{0}}(f(\mathbf{x},\mathbf{r})|\mathbb{E}_{\mathbb{P}_{0}}[f(\mathbf{x},\mathbf{r})])$ can be expanded as a series of terms, the coefficients of which can be computed under the nominal distribution $\mathbb{P}_{0}$ . By doing so, we can reformulate the DRO problem (3) as a single-layer maximization problem.

Lemma 3.2.

Suppose that $K$ is an even number, $\phi\in\mathcal{C}^{K+1}$ is a convex function which satisfies $\phi(1)=\phi^{(1)}(1)=0$ and $\phi^{(2)}(1)>0$ . Assume that $\mathbb{E}_{\mathbb{P}_{0}}[X^{k}]<\infty$ for $k\leq K$ and $X$ is defined as $X:=f(\mathbf{x},\mathbf{r})-\mathbb{E}_{\mathbb{P}_{0}}[f(\mathbf{x},\mathbf{r})]$ . Then

[TABLE]

where $b_{k}=\frac{(-1)^{k+1}z^{(k)}(0)}{(k+1)!}$ , and $\eta_{1}^{*}$ is the optimal solution to

[TABLE]

Specifically, $z(\cdot)$ is a function satisfying $z(0)=1$ , $z^{(1)}(\cdot)=\frac{1}{\phi^{(2)}(z(\cdot))}$ and $z^{(k)}(\cdot)$ can be obtained recursively for $k\geq 2$ .

Note that the above expansion applies to general utility functions $f(\mathbf{x},\mathbf{r})$ that are concave in $\mathbf{x}$ . More importantly, most of the $\phi$ -divergences (KL divergence, Cressie-Read divergence, Burg entropy, $J$ -divergence, $\chi^{2}$ -distance, modified $\chi^{2}$ -distance, and Hellinger distance) satisfy the smoothness conditions. Taking KL and Cressie-Read divergence as example, for $K=4$ , we can explicitly solve the terms in Eq. (6), as are shown in the following corollary.

Corollary 3.3.

Consider $K=4$ . We have the $4^{th}$ order expansion of $\mathcal{D}_{\eta_{2},\phi,\mathbb{P}_{0}}(f(\mathbf{x},\mathbf{r})|\mathbb{E}_{\mathbb{P}_{0}}[f(\mathbf{x},\mathbf{r})])$ :

[TABLE]

with $\eta_{1}^{*}$ being the real root to the 3rd order equation

[TABLE]

For KL divergence, the coefficients are $b_{1}=1/2$ , $b_{2}=-1/6$ , $b_{3}=1/24$ ; for Cressie-Read divergence with $\theta>2$ , the coefficients are $b_{1}=1/2$ , $b_{2}=(\theta-2)/6$ , $b_{3}=(\theta-2)(2\theta-3)/24$ .

(Gotoh et al., 2018) gives a similar expansion of $\mathcal{D}_{\eta_{2},\phi,\mathbb{P}_{0}}(f(\mathbf{x},\mathbf{r})|\mathbb{E}_{\mathbb{P}_{0}}[f(\mathbf{x},\mathbf{r})])$ in Proposition 3.5. The main difference between our expansion in Eq. (7) and their expansion lies in the calculation of $\eta_{1}^{*}$ . In Eq. (7), $\eta_{1}^{*}$ is directly solved through the polynomial equation, while in (Gotoh et al., 2018), $\eta_{1}^{*}$ is an approximated function of $\eta_{2}$ .

In the sequel, we take $K=2$ , consider the $2^{nd}$ order expansion of $\mathcal{D}_{\eta_{2},\phi,\mathbb{P}_{0}}(\mathbf{x}^{T}\mathbf{r}|\mathbf{x}^{T}\mathbf{\mu})$ and ignore the higher order terms, which gives

[TABLE]

The last equality comes as a result of

[TABLE]

and the minimum is achieved at $\eta_{2}=\sqrt{\frac{2\rho\phi^{(2)}(1)}{\mathbb{V}_{\mathbb{P}_{0}}[f(\mathbf{x},\mathbf{r})]}}$ . This suggests, when $\rho$ is small, the optimal Lagrangian multiplier $\eta_{2}$ is also small and the expansion in (6) is accurate. By taking $\max_{\mathbf{x}\in\mathcal{X}}$ on both sides, we finally achieve the $2^{nd}$ order reformulation of problem (3) in Theorem 3.4.

Theorem 3.4.

Suppose that $\phi$ is convex, twice continuously differentiable, and that $\phi(1)=\phi^{(1)}(1)=0$ and $\phi^{(2)}(1)>0$ . The DRO problem in problem (3) is asymptotically equivalent to a mean-deviation problem:

[TABLE]

Theorem 3.4 tells that the ambiguity radius $\rho$ actually controls the investor’s risk preference.

3.2. Reformulation of the CCO problem (4)

Notice that, the chance constraint in problem (4) is in the same form as the definition of Value-at-Risk (VaR), a risk measure that focuses on the probability of losses. This motivates us to reorganize the tail chance constraint in problem (4) with VaR. The VaR is defined as the minimal level $\gamma$ such that the probability that the portfolio loss $-\mathbf{x}^{T}\mathbf{r}$ exceeds $\gamma$ is below $\epsilon$ :

[TABLE]

The equivalent form of the chance constraint in problem (4): $Pr_{\sim\mathbb{P}_{0}}\{-\mathbf{x}^{T}\mathbf{r}\geq\delta\}\leq\epsilon$ implies that, $\delta$ is included in the set $\{\gamma\in\mathbb{R}:Pr_{\sim\mathbb{P}_{0}}\{-\mathbf{x}^{T}\mathbf{r}\geq\gamma\}\leq\epsilon\}$ . That is to say, the chance constraint can be reorganized with $\mathrm{V}_{\epsilon}(\mathbf{x})$ , namely,

[TABLE]

Hence, given $\mathbb{E}_{\mathbb{P}_{0}}[\mathbf{x}^{T}\mathbf{r}]=\mathbf{x}^{T}\mathbf{\mu}$ , problem (4) can be reformulated as

[TABLE]

If $\mathbb{P}_{0}$ is normal, then the VaR can be expressed as

[TABLE]

where $\kappa(\epsilon)=-\Phi^{-1}(\epsilon)$ and $\Phi^{-1}(\cdot)$ is the inverse of the cumulative distribution function of the standard normal distribution. If $\mathbb{P}_{0}$ is a member of general elliptical distribution family, (Lesniewski et al., 2016) gives an asymptotic expansion of $\mathrm{V}_{\epsilon}(\mathbf{x})$ , which takes the form $\kappa(\epsilon)\sqrt{\mathbf{x}^{T}\Sigma\mathbf{x}}-\mathbf{x}^{T}\mathbf{\mu}$ asymptotically when $\epsilon\rightarrow 0$ . For example, if $\mathbb{P}_{0}$ is a student $t$ -distribution with degree of freedom parameter $\nu$ , then $\kappa(\epsilon)=D\epsilon^{-\frac{1}{\nu}}$ , where $D=\left(\frac{c_{n}\pi^{\frac{n-1}{2}}\Gamma(\frac{\nu+1}{2})}{\nu\Gamma(\frac{\nu+n}{2})}\right)^{\frac{1}{\nu}}$ , $c_{n}=\frac{\Gamma(\frac{\nu+n}{2})}{\Gamma(\frac{\nu}{2})}\nu^{\frac{\nu}{2}}\pi^{-\frac{n}{2}}$ , and $\Gamma(\cdot)$ refers to the gamma function. For distributions other than elliptical distributions, $\sqrt{\frac{1-\epsilon}{\epsilon}}\sqrt{\mathbf{x}^{T}\Sigma\mathbf{x}}-\mathbf{x}^{T}\mathbf{\mu}$ is proved to be a valid approximation of $\mathrm{V}_{\epsilon}(\mathbf{x})$ (Ghaoui et al., 2003; Bonami and Lejeune, 2009). These in all provide the approximation of problem (4) reformulated as

[TABLE]

With the following lemma, we can verify that problem (9) is a convex optimization when $\kappa(\epsilon)>0$ . For general feasibility set $\mathcal{X}$ , problem (9) can always be efficiently solved with second-order cone programming (SOCP).

Lemma 3.5.

Suppose $a>0$ . Then the function $a\sqrt{\mathbf{x}^{T}\Sigma\mathbf{x}}-\mathbf{x}^{T}\mathbf{\mu}$ is a convex function of $\mathbf{x}$ .

4. Explicit formulations of portfolio selection

In this section, we propose the explicit formulations for portfolio selection problem with $f(\mathbf{x},\mathbf{r})=\mathbf{x}^{T}\mathbf{r}$ . It only remains to explicitly specify the set $\mathcal{X}$ . We begin with the most simple but basic unbounded set $\mathcal{X}:=\left\{\mathbf{x}\in\mathbb{R}^{n}\mid\mathbf{x}^{T}\mathbf{e}=1\right\}$ . We would denote the optimal solution and optimal value to optimization (8) by $\mathbf{x}^{*}$ and $v^{*}$ , respectively. The corresponding optimal solution and optimal value to optimization (9) are denoted by $\tilde{\mathbf{x}}^{*}$ and $\tilde{v}^{*}$ , respectively. Throughout the rest of the paper, we would denote $\mathbb{E}_{\mathbb{P}_{0}}[\mathbf{r}]$ and covariance matrix of $\mathbf{r}$ under $\mathbb{P}_{0}$ by $\mathbf{\mu}$ and $\Sigma$ , respectively. Then naturally, $\mathbb{E}_{\mathbb{P}_{0}}[\mathbf{x}^{T}\mathbf{r}]=\mathbf{x}^{T}\mathbf{\mu}$ , $\mathbb{V}_{\mathbb{P}_{0}}[\mathbf{x}^{T}\mathbf{r}]=\mathbf{x}^{T}\Sigma\mathbf{x}$ , $v^{*}=\mathbf{x}^{*T}\mathbf{\mu}-\sqrt{\frac{2\rho\mathbf{x}^{*T}\Sigma\mathbf{x}^{*}}{\phi^{(2)}(1)}}$ and $\tilde{v}^{*}=\tilde{\mathbf{x}}^{*T}\mathbf{\mu}$ .

Recall that in a convex optimization, any local optimum is also a global optimum. This motivates us to study the optimal solution to problem (8), $\mathbf{x}^{*}$ , and the optimal solution to problem (9), $\tilde{\mathbf{x}}^{*}$ , through KKT conditions. The results for $(\mathbf{x}^{*},v^{*})$ and $(\tilde{\mathbf{x}}^{*},\tilde{v}^{*})$ are summarized in Theorem 4.1 and Theorem 4.2, respectively.

Theorem 4.1.

Suppose $\phi^{(2)}(1)>0$ . Define $A:=\mathbf{e}^{T}\Sigma^{-1}\mathbf{e}$ , $B:=\mu^{T}\Sigma^{-1}\mathbf{e}$ , and $C:=\mu^{T}\Sigma^{-1}\mu$ . Then problem (8) with $f(\mathbf{x},\mathbf{r})=\mathbf{x}^{T}\mathbf{r}$ and the feasibility set $\mathcal{X}=\left\{\mathbf{x}\in\mathbb{R}^{n}\mid\mathbf{x}^{T}\mathbf{e}=1\right\}$ has an optimal solution when $\rho>\phi^{(2)}(1)(C-{B^{2}}/{A})/2$ . And the optimal solution $\mathbf{x}^{*}$ and optimal value $v^{*}$ are:

[TABLE]

where $\lambda^{*}=\frac{B}{A}-\frac{\sqrt{B^{2}-A\left(C-2\rho/{\phi^{(2)}(1)}\right)}}{A}$ .

Theorem 4.2.

Suppose $\kappa(\epsilon)>0$ . Problem (9) with $f(\mathbf{x},\mathbf{r})=\mathbf{x}^{T}\mathbf{r}$ and the feasibility set $\mathcal{X}=\left\{\mathbf{x}\in\mathbb{R}^{n}\mid\mathbf{x}^{T}\mathbf{e}=1\right\}$ has an optimal solution when $(\epsilon,\delta)$ satisfies $C-B^{2}/{A}<(\kappa(\epsilon))^{2}<\delta^{2}A+2\delta B+C$ and $B+\delta A>0$ . ( $A$ , $B$ , and $C$ defined in Theorem 4.1.) And the optimal solution $\tilde{\mathbf{x}}^{*}$ and optimal value $\tilde{v}^{*}$ are:

[TABLE]

where $\tilde{\lambda}=\frac{\sqrt{AC-B^{2}}}{A\kappa(\epsilon)^{2}-AC+B^{2}}(\frac{\kappa(\epsilon)(B+A\delta)}{\sqrt{A\delta^{2}+2B\delta+C-\kappa(\epsilon)^{2}}}+\sqrt{AC-B^{2}})$ , and $\tilde{\theta}=\frac{(C+\delta B)(\tilde{\lambda}+1)-\tilde{\lambda}\kappa(\epsilon)^{2}}{B+\delta A}$ .

Furthermore, $\tilde{v}^{*}\geq v^{*}$ , i.e., problem (8) always outperforms problem (9).

In Theorem 4.2, we first identify the sufficient conditions of $(\epsilon,\delta)$ for the optimization problem (9) to be feasible. The comparison between $\tilde{v}^{*}$ and $v^{*}$ shows that the CCO reformulation performs uniformly better than the DRO reformulation. Here it should be mentioned that, the outperformance of problem (8) over problem (9) is not so obvious. At the first glance, it does seem quite straightforward that the objective function in problem (9) is always smaller than that in problem (8). While in fact, rather than comparing $\mathbf{x}^{T}\mathbf{\mu}-\sqrt{\frac{2\rho\mathbf{x}^{T}\Sigma\mathbf{x}}{\phi^{(2)}(1)}}$ and $\mathbf{x}^{T}\mathbf{\mu}$ based on the same asset allocation strategy $\mathbf{x}$ , we are comparing the two objective functions based on their respective optimal asset allocation strategies, namely, $\mathbf{x}^{*T}\mathbf{\mu}-\sqrt{\frac{2\rho\mathbf{x}^{*T}\Sigma\mathbf{x}^{*}}{\phi^{(2)}(1)}}$ vs $\tilde{\mathbf{x}}^{*T}\mathbf{\mu}$ .

For more complex sets $\mathcal{X}$ , we resort to numerical analysis to investigate interpretation of the ambiguity radius $\rho$ through chance constraint parameters.

5. Experiments

Sections 5.1 and 5.2 are based on synthetic data to test the reformulation accuracy of the DRO problem (3) and to see how the tail heaviness of the nominal distribution $\mathbb{P}_{0}$ affects the interpretation of $\rho$ . Section 5.3 is devoted to a more detailed understanding of the ambiguity radius $\rho$ based on the empirical daily returns of 4 asset classes. And Section 5.4 uses intraday 5-minute stock returns to test the value of robust protection in the real portfolio selection problem.

5.1. Reformulation accuracy of problem (3)

In this section, we numerically test the accuracies of the $2^{nd}$ order and the $4^{th}$ order reformulations with respect to the original robust problem (3). The $\phi$ -divergence we take is KL divergence, under which problem (3) can be exactly solved. And we take the exact optimal value as a benchmark to compare the $2^{nd}$ order and the $4^{th}$ order reformulations.

Table 2 records the relative errors (in the $3^{rd}$ & $4^{th}$ columns) w.r.t. the exact optimal value (the $2^{nd}$ column) under KL divergence. It shows that the higher order improvement is particularly notable when data exhibits a heavier tail. In the case of Cressie-Read divergence, which we do not record in the table due to the page limit, we observe a 50 times improvement: when $\rho$ is set to $0.78$ , relative error for the $4^{th}$ order reformulation is $1.53\%$ , while it is $56.54\%$ for the $2^{nd}$ order reformulation given that the optimal value is $-0.2787$ . Here, we assume the ambiguity set under the KL divergence centers at a six-dimensional multivariate exponential distribution with mean=0.2, std=0.2, skewness=2, and kurtosis=6. We set the dimensions to be i.i.d to see a clean impact from the heavy tail. And the center distribution $\mathbb{P}_{0}$ under Cressie-Read divergence is multivariate $t$ . We see that the larger the size of the ambiguity set (i.e., larger $\rho$ ), the better the improvement of the $4^{th}$ order reformulation. In fact, the error reduction is about 10 folds in this example. However, using the $2^{nd}$ order equivalent formulation is good enough to solve problem (3) when $\rho$ is small.

5.2. Interpretation of $\rho$ under distributions with different tail heavinesses

This experiment shows that tail heaviness of the nominal distribution $\mathbb{P}_{0}$ indeed affects the interpretation of the ambiguity radius $\rho$ . We focus on three distributions for 5 assets: multivariate normal, lognormal distribution and student $t_{3}-$ distribution. The set of allocation strategies is bounded below by -1, and the ambiguity radius $\rho$ is fixed at $0.27$ . We plot the results of equivalent $(\epsilon,\delta)$ in Figure 1. It shows that, first, the ambiguity radius $\rho$ can be explained by a set of pairs $(\epsilon,\delta)$ in terms of the impact on the optimal value. Second, tail heaviness affects the interpretation of $\rho$ and distributions with heavier tail result in a larger loss threshold for a given loss probability $\epsilon$ .

5.3. Empirical studies with daily asset returns

To see more clearly the financial interpretation of the ambiguity radius $\rho$ , we undergo experiments based on empirical data. We extract past 40 years’ daily simple returns of four major asset classes: Equity indexes (DAX, FTSE, HSI, NASDAQ, NIKKEI250, SP500) , US Treasuries (2year, 10year, 30year), Currencies (AUD, CHF, EUR, GBP, JPY) and Commodities (Crude oil, Silver, Gold). For the DRO problem, we use the Cressie-Read divergence instead of KL divergence since all data exhibits quite heavy tail. For the CCO problem, we choose the negative daily return threshold $-\delta$ to be the $3\%$ empirical quantile of the daily simply return series for each asset class so that they can differ across assets. We choose the chance level $\epsilon$ to be $2\%$ and $5\%$ , mincing (rounded) event frequencies at quarterly (4 out of 252) and monthly (12 out of 252) so that investors can relate $\epsilon$ to the degree of event rareness. The portfolio weights are constrained to be bounded below by -1. Both multivariate $t$ - and normal distributions are tested as the center $\mathbb{P}_{0}$ of the ambiguity set $\mathcal{U}$ when fitting data. Also, we test both the $4^{th}$ order and $2^{nd}$ order reformulations of the DRO problem.

Table 3 (3(a))-(3(d)) report the equivalent ambiguity radius $\rho$ of the DRO problem, together with the corresponding optimal portfolio return (annualized), at a given pair of CCO parameters ( $\epsilon$ , $\delta$ ) for the four asset classes, respectively. Take Table 3(3(a)) as an example. There are 2 rows, 4 columns and 8 entries in total. Each row corresponds to the choice of the parameter $\epsilon$ , and each column corresponds to the choice of the reformulation framework of the DRO problem and the choice of the center distribution $\mathbb{P}_{0}$ . The upper number in one entry records the equivalent ambiguity radius $\rho$ , while the lower number in the round brackets records the corresponding optimal portfolio annualized return. With other parameter fixed, we compare the optimal portfolio returns between the multivariate student $t$ -distribution and the normal distribution, and label the entry numbers with a larger portfolio return in bold black.

We read from Table 3 that, by relating the size parameter $\rho$ of the ambiguity set in the DRO problem to the CCO chance parameters, it then becomes tangible, without which even the appropriate order is hard to guess. In our tests, its magnitude can range from $10^{-4}$ to $10^{-14}$ depending on asset classes and on the investor’s tolerance level. What’s more, the heavy-tail nature of financial data demands the usage of divergence measures (e.g., the Cressie-Read divergence) that allow heavy-tail distribution if one takes the robust approach for portfolio optimization. Ambiguity sets constructed by the KL divergence, however, require the objective function to be exponentially bounded, which exclude important heavy-tail distributions used ubiquitously for financial asset returns, e.g., the student $t$ -distribution. Among the 16 tests in Table 3, the larger return in bold shows 12 favor fitting data with $\mathbb{P}_{0}$ as multivariate $t$ -distributed.

5.4. High frequency empirical setting

We collect the intraday 5-minute asset returns of 15 stocks 111The ticker codes for the selected 15 stocks are: 00001, 00005, 00016, 00027, 00388, 00688, 00700, 00883, 00939, 00941, 01299, 01398, 01928, 02318, 03988. that are selected from the 50 Hang Seng Index constituent stocks based on the market cap and daily turnover. The data spans from Dec 1st, 2014 to Dec 1st, 2017, and consists of roughly 39,390 observations with information of the first and the last half hours in each trading day excluded.

The first experiment illustrates the trend of equivalent ambiguity radius $\rho$ as more empirical data is available. As in the last experiment, we use the Cressie-Read divergence and set the loss probability $\epsilon=3\%$ and $\delta=0.28\%$ (the $3\%$ empirical quantile of the return series over 100 trading days). The asset allocation strategy is bounded from below by $-1$ . We apply the $4^{th}$ order reformulation to solve the DRO problem and test both multivariate $t$ - and normal distribution as the nominal distribution $\mathbb{P}_{0}$ . To begin with, we compute the equivalent ambiguity radius $\rho$ based on the first 6 consecutive trading days of 5-minute return series. Then we move forward to include one more trading day’s sample data and obtain the next equivalent $\rho$ . Figure 2 plots the series of equivalent $\rho$ with each $\rho$ stamped with how many trading days’ data the computation is based on.

Figure 2 shows that, to achieve the same level of tail probability protection, the equivalent ambiguity radius $\rho$ goes down and converges as more data is available. Such a conclusion is within expectation because the more available data, the more information and thus fewer uncertainties are over the underlying distribution. The second observation accords with the conclusion in Figure 1, that is, even with the same empirical data set, the tail heaviness assumption of the center distribution affects the interpretation of the ambiguity radius $\rho$ . Robust portfolio optimization centered with heavy-tail distributions requires a larger range of robust protections to achieve the same tail probability level.

Then, we fit the returns of each single stock to a univariate student $t$ -distribution to verify that the distribution of high frequency financial data indeed exhibits heavy tail. The degree of freedom parameter, which quantifies the tail heaviness, is shown to range from 2.36 to 3.81 among the 15 stocks. Figure 3 shows the fitting results of 4 stocks accompanied with the degree of freedom parameter $\nu$ in the title position. As it suggests, assuming the nominal distribution of the returns as a student $t$ -distribution is rather reasonable.

The last experiment focuses on the value of robust protection in portfolio optimization. In real practice, portfolio optimization under a distributional robust framework is needed to protect investors from uncertainties arising from both the limited historical data and future distributional changes. It is necessary for a trader to frequently rebalance the portfolio to accommodate fluctuations in distributions. As we would demonstrate, the robust protection actually helps improve the portfolio performance, especially when compared with portfolios that are selected either based on the nominal distribution (namely, problem (2)) or under the classical Mean Variance framework. The Mean Variance model we take is:

[TABLE]

We divide the whole 3-year datasets into two consecutive parts. With the first 2-year data, we fit it to a 15-d student $t$ -distribution and establish the equivalent ambiguity radius $\rho$ = 2.4e-4 and optimal return 0.35e-4, given chance constraint parameters $(\epsilon,\delta)$ = $(3\%,39\text{e-4})$ . Then with the last-year data as a test set, we backtest the portfolio performance with three asset allocation strategies solved respectively by the DRO problem, the nominal optimization problem, and the Mean Variance problem. For the DRO problem, we fix $\rho$ = 2.4e-4, and for the Mean Variance problem, we fix $r_{target}$ = 0.35e-4. Under each optimization framework, the asset allocation strategy is not constant throughout the whole testing period. We rebalance the portfolio in the frequency of every 5 minutes/hour/half day/day. For each rebalancing, we always use its past 4 months of trading data to solve the optimal allocation strategy and then apply the strategy to next incoming 5 minutes/hour/half day/day. Table 4 summarizes the statistics of the return series based on different strategies and rebalancing frequencies.

Table 4 shows that, the dynamic allocation strategy under a robust framework always outperforms that without a robust protection and the classical Mean Variance strategy. The outperformance can be at most 7 times, depending on the rebalancing frequency. And the DRO strategy keeps a medium level of volatility, neither too aggressive nor too conservative to gain low returns. What’s more, the highest skewness for the DRO strategy also highlights its inclination to more gains than losses. Last but not the least, although the outperformance of a DRO strategy is consistent between different rebalancing frequencies, an investor benefits from more frequent rebalancing with returns far more than doubled under whatever portfolio selection framework.

6. Conclusions

We delved into the ambiguity radius for DRO problems with a distributional ambiguity set defined by $\phi$ -divergence. We showed that for general $\phi$ -divergences, a DRO optimization problem is asymptotically equivalent to a mean-deviation problem, where the risk preference parameter is controlled by the ambiguity radius. We used a portfolio selection example to demonstrate that, when the investment strategy is bounded, the ambiguity radius can be cast as a chance constraint in a deterministic optimization with the same objective. Otherwise, within the set of unbounded investment strategies, a chance-constrained deterministic optimization consistently performs better than the DRO problem. Through extensive experiments with both synthetic and empirical data, we concluded that, to achieve the same level of tail probability protection, a DRO problem centered at heavy-tail distributions requires a larger ambiguity set.

Acknowledgments

Qi WU acknowledges the GRF support from the Hong Kong Research Grants Council under 14211316 and 14206117.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Abadeh et al . (2015) Soroosh Shafieezadeh Abadeh, Peyman Mohajerin Mohajerin Esfahani, and Daniel Kuhn. 2015. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems . 1576–1584.
3Ben-Tal et al . (2013) Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. 2013. Robust solutions of optimization problems affected by uncertain probabilities. Management Science 59, 2 (2013), 341–357.
4Blanchet et al . (2018) Jose Blanchet, Lin Chen, and Xun Yu Zhou. 2018. Distributionally robust mean-variance portfolio selection with Wasserstein distances. ar Xiv preprint ar Xiv:1802.04885 (2018).
5Bonami and Lejeune (2009) Pierre Bonami and Miguel A Lejeune. 2009. An exact solution approach for portfolio optimization problems under stochastic and integer constraints. Operations research 57, 3 (2009), 650–670.
6Chen et al . (2011) Li Chen, Simai He, and Shuzhong Zhang. 2011. Tight bounds for some risk measures, with applications to robust portfolio selection. Operations Research 59, 4 (2011), 847–865.
7Chen and Paschalidis (2018) Ruidi Chen and Ioannis Ch Paschalidis. 2018. A robust learning approach for regression models based on distributionally robust optimization. The Journal of Machine Learning Research 19, 1 (2018), 517–564.
8Delage and Ye (2010) Erick Delage and Yinyu Ye. 2010. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research 58, 3 (2010), 595–612.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Understanding Distributional Ambiguity

Abstract.

1. Introduction

2. Problem setup

2.1. Notations

Definition 2.1.

2.2. Motivation

3. Analysis of DRO and CCO problems

3.1. Reformulation of the DRO problem (3)

Theorem 3.1.

Lemma 3.2.

Corollary 3.3.

Theorem 3.4.

3.2. Reformulation of the CCO problem (4)

Lemma 3.5.

4. Explicit formulations of portfolio selection

Theorem 4.1.

Theorem 4.2.

5. Experiments

5.1. Reformulation accuracy of problem (3)

5.2. Interpretation of ρ\rhoρ under distributions with different tail heavinesses

5.3. Empirical studies with daily asset returns

5.4. High frequency empirical setting

6. Conclusions

Acknowledgments

5.2. Interpretation of $\rho$ under distributions with different tail heavinesses