Incorporating statistical model error into the calculation of   acceptability prices of contingent claims

Martin Glanzer; Georg Ch. Pflug; Alois Pichler

arXiv:1703.05709·q-fin.PR·January 31, 2019·Math. Program.

Incorporating statistical model error into the calculation of acceptability prices of contingent claims

Martin Glanzer, Georg Ch. Pflug, Alois Pichler

PDF

Open Access

TL;DR

This paper develops a method to incorporate statistical model error into acceptability pricing of contingent claims by using distributionally robust optimization within a confidence set of models, linking data quality to pricing robustness.

Contribution

It introduces a novel approach to account for model uncertainty in acceptability prices using a nonparametric neighborhood and dual problem formulation.

Findings

01

Distributionally robust acceptability prices are derived.

02

A large deviations result for nested distance is proved.

03

Pricing robustness relates to data quality.

Abstract

The determination of acceptability prices of contingent claims requires the choice of a stochastic model for the underlying asset price dynamics. Given this model, optimal bid and ask prices can be found by stochastic optimization. However, the model for the underlying asset price process is typically based on data and found by a statistical estimation procedure. We define a confidence set of possible estimated models by a nonparametric neighborhood of a baseline model. This neighborhood serves as ambiguity set for a multi-stage stochastic optimization problem under model uncertainty. We obtain distributionally robust solutions of the acceptability pricing problem and derive the dual problem formulation. Moreover, we prove a general large deviations result for the nested distance, which allows to relate the bid and ask prices under model ambiguity to the quality of the observed data.

Equations179

A (Y) = in f {E [Y Z] : Z \in Z},

A (Y) = in f {E [Y Z] : Z \in Z},

∣ A (Y_{1}) - A (Y_{2}) ∣ \leq K_{1} ∥ Y_{1} - Y_{2} ∥_{p} .

∣ A (Y_{1}) - A (Y_{2}) ∣ \leq K_{1} ∥ Y_{1} - Y_{2} ∥_{p} .

Z = {Z \in L_{1} (Ω) : 0 \leq Z \leq 1/ α and E (Z) = 1} .

Z = {Z \in L_{1} (Ω) : 0 \leq Z \leq 1/ α and E (Z) = 1} .

∥ Y ∥_{p} = i = 1 \sum m ∥ Y^{(i)} ∥_{p},

∥ Y ∥_{p} = i = 1 \sum m ∥ Y^{(i)} ∥_{p},

\displaystyle[left=(\textnormal{P})\hskip 5.69054pt\empheqlVert\,]\pi_{a}(\mathcal{A}_{1},\dots,\mathcal{A}_{T})=\min_{x}\leavevmode\nobreak\

\displaystyle[left=(\textnormal{P})\hskip 5.69054pt\empheqlVert\,]\pi_{a}(\mathcal{A}_{1},\dots,\mathcal{A}_{T})=\min_{x}\leavevmode\nobreak\

A_{t} (x_{t - 1}^{⊤} S_{t} - x_{t}^{⊤} S_{t} - C_{t}) \geq 0

A_{T} (x_{T - 1}^{⊤} S_{T} - C_{T}) \geq 0,

\displaystyle[left=(\textnormal{P}^{\prime})\hskip 5.69054pt\empheqlVert\,]\pi_{b}(\mathcal{A}_{1},\dots,\mathcal{A}_{T})=\max_{x}\leavevmode\nobreak\

\displaystyle[left=(\textnormal{P}^{\prime})\hskip 5.69054pt\empheqlVert\,]\pi_{b}(\mathcal{A}_{1},\dots,\mathcal{A}_{T})=\max_{x}\leavevmode\nobreak\

A_{t} (x_{t}^{⊤} S_{t} - x_{t - 1}^{⊤} S_{t} + C_{t}) \geq 0

A_{T} (- x_{T - 1}^{⊤} S_{T} + C_{T}) \geq 0,

∣ v^{β} - v^{*} ∣ \leq 2 \overset{ˉ}{β} \cdot ∥ S_{0} ∥_{1}

∣ v^{β} - v^{*} ∣ \leq 2 \overset{ˉ}{β} \cdot ∥ S_{0} ∥_{1}

v^{- ∣ β ∣}

v^{- ∣ β ∣}

v^{- ∣ β ∣}

E [(x_{t - 1} - x_{t})^{⊤} S_{t} Z_{t}]

E [(x_{t - 1} - x_{t})^{⊤} S_{t} Z_{t}]

= E [(a_{t - 1} - a_{t})^{⊤} S_{t} Z_{t}] = 2∣ β_{t} ∣ i = 1 \sum m E [S_{t}^{(i)} Z_{t}]

\displaystyle\geq 2|\beta_{t}|\cdot\biggl{(}\inf\sum_{i=1}^{m}S_{t}^{(i)}\biggr{)}\cdot\mathbb{E}[Z_{t}]\geq 2|\beta_{t}|

0 \leq v^{∣ β ∣} - v^{- ∣ β ∣} \leq x_{0}^{⊤} S_{0} - x_{0}^{* ⊤} S_{0} = a_{0}^{⊤} S_{0} = 2 \overset{ˉ}{β} i \sum S_{0}^{(i)} = 2 \overset{ˉ}{β} \cdot ∥ S_{0} ∥_{1},

0 \leq v^{∣ β ∣} - v^{- ∣ β ∣} \leq x_{0}^{⊤} S_{0} - x_{0}^{* ⊤} S_{0} = a_{0}^{⊤} S_{0} = 2 \overset{ˉ}{β} i \sum S_{0}^{(i)} = 2 \overset{ˉ}{β} \cdot ∥ S_{0} ∥_{1},

E [((x_{t - 1} - x_{t})^{⊤} S_{t} - C_{t}) Z_{t}] \geq 0 for all Z_{t} \in Z_{t},

E [((x_{t - 1} - x_{t})^{⊤} S_{t} - C_{t}) Z_{t}] \geq 0 for all Z_{t} \in Z_{t},

A_{t, n} (Y) = min {E [Y \cdot Z_{t, i}] : 1 \leq i \leq n} .

A_{t, n} (Y) = min {E [Y \cdot Z_{t, i}] : 1 \leq i \leq n} .

A_{t, n} (Y) ↓ A_{t} (Y),

A_{t, n} (Y) ↓ A_{t} (Y),

v_{n}^{*} ↑ v^{*} .

v_{n}^{*} ↑ v^{*} .

\displaystyle Y_{t}(x)=\left\{\begin{array}[]{ll}(x_{t-1}-x_{t})^{\top}\,S_{t}-C_{t}&\qquad\hbox{ for }1\leq t<T\\ x_{T-1}^{\top}\,S_{T}-C_{T}&\qquad\hbox{ for }t=T\,.\end{array}\right.

\displaystyle Y_{t}(x)=\left\{\begin{array}[]{ll}(x_{t-1}-x_{t})^{\top}\,S_{t}-C_{t}&\qquad\hbox{ for }1\leq t<T\\ x_{T-1}^{\top}\,S_{T}-C_{T}&\qquad\hbox{ for }t=T\,.\end{array}\right.

\tilde{S}_{t}

\tilde{S}_{t}

\tilde{C}_{t}

\tilde{x}_{t}^{*}

∥ S_{t} - \tilde{S}_{t} ∥_{p}

∥ S_{t} - \tilde{S}_{t} ∥_{p}

∥ C_{t} - \tilde{C}_{t} ∥_{p}

∥ x_{t}^{*} - \tilde{x}_{t}^{*} ∥_{\infty}

\displaystyle\tilde{Y}_{t}(x)=\left\{\begin{array}[]{ll}(x_{t-1}-x_{t})^{\top}\,\tilde{S}_{t}-\tilde{C}_{t}&\qquad\hbox{ for }1\leq t<T\\ x_{T-1}^{\top}\,\tilde{S}_{T}-\tilde{C}_{T}&\qquad\hbox{ for }t=T.\end{array}\right.

\displaystyle\tilde{Y}_{t}(x)=\left\{\begin{array}[]{ll}(x_{t-1}-x_{t})^{\top}\,\tilde{S}_{t}-\tilde{C}_{t}&\qquad\hbox{ for }1\leq t<T\\ x_{T-1}^{\top}\,\tilde{S}_{T}-\tilde{C}_{T}&\qquad\hbox{ for }t=T.\end{array}\right.

∣ A_{t} (\tilde{Y}_{t} (\tilde{x}_{t}^{*}))

∣ A_{t} (\tilde{Y}_{t} (\tilde{x}_{t}^{*}))

\leq K_{1} ∥ \tilde{Y}_{t} (\tilde{x}_{t}^{*}) - Y_{t} (x_{t}^{*}) ∥_{p}

\leq K_{1} [∥ \tilde{x}_{t}^{*} - x_{t}^{*} ∥_{\infty} ∥ \tilde{S}_{t} ∥_{p} + ∥ x_{t}^{*} ∥_{\infty} ∥ \tilde{S}_{t} - S_{t} ∥_{p} + ∥ \tilde{C}_{t} - C_{t} ∥_{p}]

\leq K_{1} [δ K_{3} + δ K_{2} + δ] = η [2∥ S_{0} ∥_{1}]^{- 1} .

v^{*} \leq \tilde{v}^{*} + η,

v^{*} \leq \tilde{v}^{*} + η,

\tilde{v}^{*} < \tilde{v}_{n}^{*} + η .

\tilde{v}^{*} < \tilde{v}_{n}^{*} + η .

\tilde{v}_{n}^{*} \leq v_{n}^{*} + η .

\tilde{v}_{n}^{*} \leq v_{n}^{*} + η .

v^{*} \leq v_{n}^{*} + 3 η,

v^{*} \leq v_{n}^{*} + 3 η,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Advanced Statistical Process Monitoring · Stochastic processes and financial applications

Full text

Incorporating statistical model error into the calculation of acceptability prices of contingent claims

Martin Glanzer

Georg Ch. Pflug

Alois Pichler

Abstract

The determination of acceptability prices of contingent claims requires the choice of a stochastic model for the underlying asset price dynamics. Given this model, optimal bid and ask prices can be found by stochastic optimization. However, the model for the underlying asset price process is typically based on data and found by a statistical estimation procedure. We define a confidence set of possible estimated models by a nonparametric neighborhood of a baseline model. This neighborhood serves as ambiguity set for a multistage stochastic optimization problem under model uncertainty. We obtain distributionally robust solutions of the acceptability pricing problem and derive the dual problem formulation. Moreover, we prove a general large deviations result for the nested distance, which allows to relate the bid and ask prices under model ambiguity to the quality of the observed data.

1 Introduction

The no-arbitrage paradigm is the cornerstone of mathematical finance. The fundamental work of Harrison, Kreps and Pliska [12, 13, 14, 21] and Delbaen and Schachermayer [6], to mention some of the most important contributions, paved the way for a sound theory for the pricing of contingent claims. In a general market model, the exclusion of arbitrage opportunities leads to intervals of fair prices.

Typically, the resulting no-arbitrage price bounds are too wide to provide practically meaningful information.111For example, the superreplication price for a plain vanilla call option in exponential Lévy models is given by the spot price of the underlying asset (see Cont and Tankov [4, Prop. 10.2]), which is a trivial upper bound for the call option price. In practice, market-makers wish to have a framework for controlling the acceptable risk when setting their spreads. Pioneering contributions to incorporate risk in the pricing procedure for contingent claims were made by Carr, Geman and Madan [3] as well as Föllmer and Leukert [8, 9], subsequent generalizations being made, e.g., by Nakano [24] or Rudloff [42]. The pricing framework of the present paper is in this spirit: by specifying acceptability functionals, an agent may control her shortfall risk in a rather intuitive manner. In particular, using the Average-Value-at-Risk ( $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ ) will allow for a whole range of prices between the extreme cases of hedging with probability one (the traditional approach) and hedging w.r.t. expectation by varying the parameter $\alpha\,$ .

Nowadays, there is great awareness of the epistemic uncertainty inherent in setting up a stochastic model for a given problem. For single-stage and two-stage situations, there is a plethora of available literature on different approaches to account for model ambiguity (see the lists contained in [31, pp. 232–233] or [45, p. 2]). Recently, balls w.r.t. the Kantorovich-Wasserstein distance around an estimated model have gained a lot of popularity (e.g., [23, 11, 10, 46, 25, 7]), while originally proposed by Pflug and Wozabal [34] in 2007. However, the literature on nonparametric ambiguity sets for multistage problems is still extremely sparse. Analui and Pflug [1] were the first to study balls w.r.t. the multistage generalization of the Kantorovich-Wasserstein distance, named nested distance,222The definition of the nested distance can be found in the Appendix. for incorporating model uncertainty into multistage decision making. It is the aim of this article to further explore this rather uncharted territory. The classic mathematical finance problem of contingent claim pricing serves as a very well suited instance for doing so. In fact, while in the traditional pointwise hedging setup only the null sets of the stochastic model for the dynamics of the underlying asset price process influence the resulting price of a contingent claim, the full specification of the model affects the claim price when acceptability is introduced. Thus, model dependency is even stronger in the latter case, which is the topic of this paper.

Stochastic optimization offers a natural framework to deal with the problems of mathematical finance. Application of the fundamental work of Rockafellar and Wets [35, 36, 38, 39, 37, 40, 41] on conjugate duality and stochastic programming has led to a stream of literature on those topics. King [17] originally formulated the problem of contingent claim pricing as a stochastic program. Extensions of this approach have been made, amongst others, by King, Pennanen and their coauthors [18, 28, 19, 17, 20, 26, 27], Kallio and Ziemba [16] or Dahl [5]. The stochastic programming approach naturally allows for incorporating features and constraints of real-world markets and allows to efficiently obtain numerical results by applying the powerful toolkit of available algorithms for convex optimization problems.

The main contribution of this article is the link between statistical model error and the pricing of contingent claims, where the pricing methodology allows for a controlled hedging shortfall. The setup is inspired by practically very relevant aspects of decision making under both aleatoric and epistemic uncertainty. Given the stochastic model from which future evolutions are drawn, agents are willing to accept a certain degree of risk in their decisions. However, it may be dangerously misleading to neglect the fact that it is impossible to detect the true model without error. Thus, a distributionally robust framework, which takes the limitations of nonparametric statistical estimation into account, is required. In the statistical terminology, balls w.r.t. the nested distance may be seen as confidence regions: by considering all models whose nested distance to the estimated baseline model does not exceed some threshold, it is ensured that the true model is covered with a certain probability and hence the decision is robust w.r.t. the statistical model estimation error. In particular, we prove a large deviations theorem for the nested distance, based on which we show that a scenario tree can be constructed out of data such that it converges (in terms of the nested distance) to the true model in probability at an exponential rate. Thus, distributionally robust claim prices w.r.t. nested distance balls as ambiguity sets include a hedge under the true model with arbitrary high probability, depending on the available data. In other words, we provide a framework that allows for setting up bid and ask prices for a contingent claim which result from finding hedging strategies with truly calculated risks, since the important factor of model uncertainty is not neglected.

This paper is organized as follows. In Section 2 we introduce our framework for acceptability pricing, i.e., we replace the traditional almost sure super-/ subreplication requirement by the weaker constraint of an acceptable hedge. The acceptability condition is formulated w.r.t. one given probability model. This lowers the ask price and increases the bid price such that the bid-ask spread may be tightened or even closed. Section 3 contains the main results of this article. We weaken the assumption of one single probability model assuming that a collection of models is plausible. In particular, we define the distributionally robust acceptability pricing problem and derive the dual problem formulation under rather general assumptions on the ambiguity set. The effect of the introduction of acceptability and ambiguity into the classical pricing methodology is nicely mirrored by the dual formulations. Moreover, we give a strong statistical motivation for using nested distance balls as ambiguity sets by proving a large deviations theorem for the nested distance. Section 4 contains illustrative examples to visualize the effect of acceptability and model ambiguity on contingent claim prices. In Section 5 we discuss the algorithmic solution of the $\operatorname{\mathbb{A}\mathbb{V}@R}$ -acceptability pricing problem w.r.t. nested distance balls as ambiguity sets. In particular, we exploit the duality results of Section 3 and the special stagewise structure of the nested distance by a sequential linear programming algorithm which yields approximate solutions to the originally semi-infinite non-convex problem. In this way, we overcome the current state-of-the-art computational methods for multistage stochastic optimization problems under non-parametric model ambiguity. Finally, we summarize our results in Section 6.

2 Acceptability pricing

2.1 Acceptability functionals

The terminology introduced in this section follows the book of Pflug and Römisch [33]. A detailed discussion of acceptability functionals and their properties can be found therein. Intuitively speaking, an acceptability functional $\mathcal{A}$ maps a stochastic position $Y\in L_{p}(\Omega),1<p<\infty,$ defined on a probability space $(\Omega,\mathcal{F},\mathbb{P})$ , to the real numbers extended by $-\infty$ in such a way that higher values of the position correspond to higher values of the functional, i.e., a ‘higher degree of acceptance’. In particular, the defining properties of an acceptability functional are translation equivariance,333 $\mathcal{A}(Y+c)=\mathcal{A}(Y)+c$ for any $c\in\mathbb{R}$ concavity, monotonicity,444 $X\leq Y\textnormal{ a.s. }\Longrightarrow\mathcal{A}(X)\leq\mathcal{A}(Y)$ and positive homogeneity. We assume all acceptability functionals to be version independent,555For version independent acceptability functionals, upper semi-continuity follows from concavity (see Jouini, Schachermayer and Touzi [15]). i.e., $\mathcal{A}(Y)$ depends only on the distribution of the random variable $Y$ .

The following proposition is well-known. It follows directly from the Fenchel-Moreau-Rockafellar Theorem (see [35, Th. 5] and [33, Th. 2.31]).

Proposition 1.

An acceptability functional $\mathcal{A}$ which fulfills the above conditions has a dual representation of the form

[TABLE]

where $\mathcal{Z}$ is a closed convex subset of $L^{q}(\Omega)$ , with $1/p+1/q=1\,$ . We call $\mathcal{Z}$ the superdifferential of $\mathcal{A}$ . Monotonicity and translation equivariance imply that all $Z\in\mathcal{Z}$ are nonnegative densities.

Assumption A1. There exists some constant $K_{1}\in\mathbb{R}$ such that for all $Z\in\mathcal{Z}$ it holds $\|Z\|_{q}\leq K_{1}\,$ .

This assumption implies that $\mathcal{A}$ is Lipschitz on $L_{p}$ :

[TABLE]

A good example for such an acceptability functional is the Average Value-at-Risk, $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ , whose superdifferential is given by

[TABLE]

The extreme cases are represented by the essential infimum ( $\operatorname{\mathbb{A}\mathbb{V}@R}_{0}(Y):=\lim_{\alpha\downarrow 0}\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}(Y)=\operatorname{essinf}(Y)$ 666Strictly speaking, Assumption A1 is not respected by $\operatorname{\mathbb{A}\mathbb{V}@R}_{0}\,$ . However, all our results on $\operatorname{\mathbb{A}\mathbb{V}@R}$ –acceptability pricing will hold true also for $\operatorname{\mathbb{A}\mathbb{V}@R}_{0}\,$ . In fact, this is the special case which is well treated in the literature.) and the expectation ( $\alpha=1$ ). Its superdifferentials are given by the set of all probability densities and just the function identically $1$ , respectively.

Other common names for the $\operatorname{\mathbb{A}\mathbb{V}@R}$ are Conditional-Value-at-Risk, Tail-Value-at-Risk, or Expected Shortfall. The subtleties between these terminologies are, e.g., addressed in Sarykalin et al. [43]. All our computational studies in Section 4 and Section 5 will be based on some $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ , while our theoretical results are general.

2.2 Acceptable replications

Let us now introduce the notion of acceptability in the pricing procedure for contingent claims.

As usual in mathematical finance, we consider a market model as a filtered probability space $(\Omega,\mathcal{F},\mathbb{P})$ , where the filtration is given by the increasing sequence of sigma-algebras $\mathcal{F}=(\mathcal{F}_{0},\mathcal{F}_{1},\dots,\mathcal{F}_{T})$ with $\mathcal{F}_{0}=\{\emptyset,\Omega\}$ . The liquidly traded basic asset prices are given by a discrete-time $\mathbb{R}_{+}^{m}$ -valued stochastic process $S=(S_{0},\dots,S_{T})$ , where $S_{t}=(S_{t}^{(1)},S_{t}^{(2)},\dots,S_{t}^{(m)})$ . We assume the filtration to be generated by the asset price process.

One asset, denoted by $S^{(1)}$ , serves as numéraire (a risk-less bond, say). We assume w.l.o.g. that $S_{t}^{(1)}=1$ a.s. If not, we may replace $(S_{t}^{(1)},S_{t}^{(2)},\dots,S_{t}^{(m)})$ by $(1,S_{t}^{(2)}/S_{t}^{(1)},\dots,S_{t}^{(m)}/S_{t}^{(1)})$ .

A contingent claim $C$ consists of an $\mathcal{F}$ -adapted series of cash flows $C=(C_{1},\ldots,C_{T})$ measured in units of the numéraire. The fact that the payoff $C_{t}$ is contingent on the respective state of the market up to time $t$ is reflected by the condition that $C$ is adapted to the filtration $\mathcal{F}$ , for which we write $C\lhd\mathcal{F}$ . A trading strategy $x=(x_{0},\ldots,x_{T-1})$ is an $\mathcal{F}$ -adapted $\mathbb{R}^{m}$ -valued process with $x\lhd\mathcal{F}$ .

To be more precise, let {AmSalign*} L^m_p &:= R^m ×L_p^m(Ω,F_1) ×…×L_p^m(Ω,F_T) ,

L^m_∞:= R^m ×L_∞^m(Ω,F_1) ×…×L_∞^m(Ω,F_T-1) ,

and

L^1_q := L_q(Ω,F_1) ×…×L_q(Ω,F_T) .

We assume that $S\in\mathcal{L}^{m}_{p}$ , $x\in\mathcal{L}_{\infty}^{m}$ and $C\in\mathcal{L}_{p}^{1}$ . The norm in $L^{m}_{p}$ is given by

[TABLE]

and similarly for $L_{\infty}^{m}\,$ . Notice that $x_{0}$ and $S_{0}$ are deterministic vectors.

Assumption A2. We assume that all claims are Lipschitz-continuous functions of the underlying asset price process $S$ .

Definition 1.

Consider a contingent claim $C$ and fix acceptability functionals $\mathcal{A}_{t}$ , for all $t=1,\ldots,T$ . We assume that all functionals $\mathcal{A}$ have a representation given by Proposition 1. Then the acceptable prices are given by the optimal values of the following stochastic optimization programs:

i)

the acceptable ask price of $C$ is defined as

[TABLE] 2. ii)

the acceptable bid price of $C$ is defined as

[TABLE]

where the optimization runs over all trading strategies $x\in\mathcal{L}_{\infty}^{m}$ for the liquidly traded assets. The constraints in (2a) and (3a) are formulated for all $t=1,\ldots,T-1$ .

To interpret Definition 1, the acceptable ask price is given by the minimal initial capital required to acceptably superhedge the cash-flows $C_{t}$ , which have to be paid out by the seller. On the other hand, the acceptable bid price corresponds to the maximal amount of money that can initially be borrowed from the market to buy the claim, such that by receiving the payments $C_{t}$ and always rebalancing one’s portfolio in an acceptable way, one ends up with an acceptable position at maturity.

In what follows we will mainly consider the ask price problem $(\rm{P})$ and its variants. The bid price problem $(\rm{P}^{\prime})$ is its mirror image and all assertions and proofs for the problem $(\rm{P})$ can be rewritten literally for problem $(\rm{P}^{\prime})$ .

Let $(\rm{P}^{\beta})$ for $\beta=(\beta_{1},\dots,\beta_{T})$ be the problem $(\textnormal{P})$ , where the conditions (2a) and (2b) are replaced by $\mathcal{A}_{t}(\cdot)\geq\beta_{t}$ .

Assumption A3. The optima are attained and all solutions $x$ to the problems $(\rm{P}^{\beta})$ , for $\beta$ in a neighborhood of 0, are uniformly bounded, i.e., $\exists K_{2}\in\mathbb{R}\textnormal{ s.t. }\forall x\colon\|x\|_{\infty}\leq K_{2}$ .

We show the following auxiliary result for the problems $(\rm{P}^{\beta})$ .

Lemma 1.

Let $v^{\beta}$ be the optimal value of $(\rm{P}^{\beta})$ and $v^{*}$ be the optimal value of $(\rm{P})$ . Then, in a neighborhood of [math],

[TABLE]

where $\bar{\beta}=\sum_{t}|\beta_{t}|$ .

Proof.

If $v^{\beta}$ is the optimal value of $(\rm{P}^{\beta})$ , then by inclusion of the feasible sets

[TABLE]

We have to bound $v^{|\beta|}-v^{-|\beta|}$ . Let $x_{t}^{*}$ be the solution of $(\rm{P}^{-|\beta|})$ . $x_{t}^{*}$ is not necessarily feasible for $(\rm{P}^{|\beta|})$ . We modify $x_{t}^{*}$ in order to get feasibility for $(\rm{P}^{|\beta|})$ . Let $a_{t},t=1,\dots,T-1\,$ , be the vector with identical components $2\sum_{s=t+1}^{T}|\beta_{s}|$ and let $x_{t}=x_{t}^{*}+a_{t}$ . Then

[TABLE]

since $\sum_{i}S_{t}^{(i)}\geq S_{t}^{(1)}=1$ and $\mathbb{E}[Z_{t}]=1$ . By $\mathbb{E}[(x_{t-1}^{*}-x_{t}^{*})^{\top}S_{t}Z_{t}]\geq-|\beta_{t}|$ , one gets that $\mathbb{E}[(x_{t-1}-x_{t})^{\top}S_{t}Z_{t}]\geq|\beta_{t}|$ , i.e., $x_{t}$ is feasible for $(\rm{P}^{|\beta|})$ . Notice that $a_{0}$ has all components equal to $\sum_{t}|\beta_{t}|=\bar{\beta}$ . Now

[TABLE]

which concludes the proof. ∎

Notice that the primal program $(\rm{P})$ is semi-infinite, if the constraints are written in the extensive form

[TABLE]

where $Z=(Z_{1},\ldots,Z_{T})\in\mathcal{L}^{1}_{q}$ .

Lemma 2 below demonstrates the validity of an approximation with only finitely many supergradients.

Since the $L_{p}$ spaces are separable, there exist sequences $(Z_{t,1},Z_{t,2},\dots)$ that are dense in $\mathcal{Z}_{t}$ , for each $t\,$ . Let

[TABLE]

Since $Z\mapsto\mathbb{E}[YZ]$ is continuous in $L_{p}\,$ , for every $Y$ in $L_{p}(\Omega,\mathcal{F}_{t})$ it holds that

[TABLE]

as $n\to\infty$ .

Lemma 2.

Let $v^{*}$ be the optimal value of the basic problem $(\rm{P})$ and let $v^{*}_{n}$ be the optimal value of the similar optimization problem $(\rm{P}_{n})$ , where $\mathcal{A}_{t}$ are replaced by $\mathcal{A}_{t,n}$ . Then

[TABLE]

Proof.

Suppose the contrary, that is $\sup_{n}v_{n}^{*}\leq v^{*}-3\eta<v^{*}$ for some $\eta>0$ . Introduce the notation

[TABLE]

By Assumption A1 and since $x\in\mathcal{L}_{\infty}^{m}$ , it holds that $x\mapsto\mathcal{A}_{t}(Y_{t}(x))$ and $x\mapsto x_{0}^{\top}S_{0}$ are Lipschitz. Choose $0<\delta=\eta\left[2\|S_{0}\|_{1}K_{1}(K_{2}+K_{3}+1)\right]^{-1}$ with $K_{3}\geq\|S_{t}\|_{p}$ for all $t$ . Let $x_{t}^{*}$ be the solution of $(\rm{P})$ . We may find finite sub-sigma-algebras $\tilde{\mathcal{F}}_{t}\subseteq\mathcal{F}_{t}$ such that with

[TABLE]

we have that

[TABLE]

Denote by $(\tilde{\rm{P}})$ the variant of the problem $(\rm{P})$ , where the processes $(S_{t})$ and $(C_{t})$ are replaced by $(\tilde{S}_{t})$ and $(\tilde{C}_{t})$ . Similarly as before introduce the notation

[TABLE]

Notice that

[TABLE]

By Lemma 1 we may conclude that

[TABLE]

where $\tilde{v}^{*}$ is the optimal value of $(\tilde{\rm{P}})$ . Let $(\tilde{\rm{P}}_{n})$ be the variant of problem $(\tilde{\rm{P}})$ , where all $\mathcal{A}_{t}$ are replaced by $\mathcal{A}_{t,n}$ . The optimal value of $(\tilde{\rm{P}}_{n})$ is denoted by $\tilde{v}_{n}^{*}$ . In this finite situation we may show that $\tilde{v}_{n}^{*}\uparrow\tilde{v}^{*}$ . Obviously, $\tilde{v}_{n}^{*}$ is a monotonically increasing sequence with $\tilde{v}_{n}^{*}\leq\tilde{v}^{*}$ .

It remains to demonstrate that $\lim_{n}\tilde{v}_{n}^{*}$ cannot be smaller than $\tilde{v}^{*}$ . For this, let $\tilde{x}^{{n}*}$ be a solution of $(\tilde{\rm{P}}_{n})$ . Because of the finiteness of the filtration $\tilde{\mathcal{F}}$ , the solutions of $(\tilde{\rm{P}}_{n})$ as well as of $\tilde{\rm{P}}$ are just bounded vectors in some high-, but finite dimensional $\mathbb{R}^{N}$ and are all bounded by $K_{2}$ . Let $\tilde{x}^{**}$ be an accumulation point of $(\tilde{x}^{{n}*})$ , i.e., we have for some subsequence that $\tilde{x}^{{n_{i}*}}\to\tilde{x}^{**}$ . We show that $\tilde{x}^{**}$ satisfies the constraints of $(\tilde{\rm{P}})$ .

Suppose the contrary. Then there is a $t$ such that $\mathcal{A}_{t}(\tilde{Y}_{t}(\tilde{x}^{**}))<0$ . This implies that there is a $Z_{t,m}\in\{Z_{t,1},Z_{t,2},\dots\}$ such that $\mathbb{E}[\tilde{Y}_{t}(\tilde{x}^{**})\cdot Z_{t,m}]<0$ . However, for $n\geq m$ , by construction $\mathbb{E}[\tilde{Y}_{t}(\tilde{x}^{n*})\cdot Z_{t,m})]\geq 0$ and since $\tilde{x}^{n*}\to\tilde{x}^{**}$ componentwise, then also $\mathbb{E}[\tilde{Y}_{t}(\tilde{x}^{**})\cdot Z_{t,m}]\geq 0\,.$ Since the objective function is continuous in $\tilde{x}$ this implies that $\lim_{i}\tilde{v}_{n_{i}}^{*}=\tilde{v}^{*}$ and, by monotonicity, $\lim_{n}\tilde{v}_{n}^{*}=\tilde{v}^{*}$ . We have therefore shown that we can find an index $n$ such that

[TABLE]

Let $x^{n*}$ be the solution of $(\rm{P}_{n})$ and let $\hat{x}^{n*}=\mathbb{E}[x^{n*}|\tilde{\mathcal{F}}_{t}]\,$ . Analogously as before, one may prove that $|\mathcal{A}_{t}(\tilde{Y}_{t}(\hat{x}^{n*})|\leq\eta\left[2\|S_{0}\|_{1}\right]^{-1}$ and hence, by Lemma 1,

[TABLE]

Putting (7), (8) and (9) together one sees that

[TABLE]

which contradicts the assumption that $v^{*}_{n}<v^{*}-3\eta$ . ∎

We now turn to the duals of the problems $(\rm{P})$ and $(\rm{P}^{\prime})$ , called $(\rm{D})$ and $(\rm{D}^{\prime})$ , respectively. It turns out that also in our general acceptability case a martingale property appears in the dual as it is known for the case of a.s. super-/ subreplication.

Theorem 1.

For all $t=1,\ldots,T$ , let $\mathcal{A}_{t}$ be acceptability functionals with corresponding superdifferentials $\mathcal{Z}_{t}$ . Then, the acceptable ask price is given by

[TABLE]

and the acceptable bid price is given by

[TABLE]

Proof.

The acceptable ask/ bid price corresponds to a special case of the distributionally robust acceptable ask/ bid price introduced in Definition 2 below, namely when the ambiguity set reduces to a singleton. Hence, the validity of Theorem 1 follows directly from the proof of Theorem 2. ∎

Remark 1 (Interpretation of the dual formulations).

*The objective of

the dual formulations $({\rm{D}})$ and $({\rm{D}}^{\prime})$ is to maximize (minimize, resp.) the expected value of the payoffs resulting from the claim w.r.t. some feasible measure $\mathbb{Q}$ . The constraints (10a) and (11a) require $\mathbb{Q}$ to be such that the underlying asset price process is a martingale w.r.t. $\mathbb{Q}$ . This is well known from the traditional approach of pointwise super-/ subreplication. The acceptability criterion enters the dual problems in terms of the constraints (10b) and (11b), which reduce the feasible sets by a stronger condition than the two probability measures just having the same null sets. Making the feasible sets smaller obviously lowers the ask price and increases the bid price and thus gives a tighter bid-ask spread.*

Proposition 2.

For fixed acceptability functionals $\mathcal{A}_{1},\ldots,\mathcal{A}_{T}$ , consider the acceptable ask price $\pi^{a}(\mathbb{P})$ as a function of the underlying model $\mathbb{P}\,$ . This function is Lipschitz.

Proof.

The assertion follows from Theorem 5 in the Appendix, considering the Lipschitz property of claims (Assumption A2) and the problem formulation resulting from Theorem 1. ∎

3 Model ambiguity and distributional robustness

Traditional stochastic programs are based on a given and fixed probability model for the uncertainties. However, already since the pioneering paper of Scarf [44] in the 1950s, it was felt that the fact that these models are based on observed data as well as the statistical error should be taken into account when making decisions. Ambiguity sets are typically either a finite collection of models or a neighborhood of a given baseline model. In what follows we study the latter case and, in particular, we use the nested distance to construct parameter-free ambiguity sets.

3.1 Acceptability pricing under model ambiguity

In Section 2.2 we defined the bid/ ask price of a contingent claim as the maximal/ minimal amount of capital needed in order to sub-/ superhedge its payoff(s) w.r.t. to an acceptability criterion. However, the result computed with this approach heavily depends on the particular choice of the probability model. This section weakens the strong dependency on the model. More specifically, acceptable bid and ask prices shall be based on an acceptability criterion that is robust w.r.t. all models contained in a certain ambiguity set.

Definition 2.

Consider a contingent claim $C$ . Then, for acceptability functionals $\mathcal{A}_{t}$ , $t=1,\ldots,T$ , and an ambiguity set $\operatorname*{{\mathcal{P}}_{\!\!\varepsilon}}$ of probability models,

i)

the distributionally robust acceptable ask price of $C$ is defined as

[TABLE] 2. ii)

the distributionally robust acceptable bid price is defined as

[TABLE]

where the optimization runs over all trading strategies $x\in\mathcal{L}_{\infty}^{m}$ for the liquidly traded assets. The constraints in (12a) and (13a) are formulated for all $t=1,\ldots,T-1$ and $\mathcal{A}_{t}^{\mathbb{P}}$ denotes the value of the acceptability functional when the underlying probability model is given by $\mathbb{P}$ .

Theorem 2.

Let $\operatorname*{{\mathcal{P}}_{\!\!\varepsilon}}$ be a convex set of probability models, which is spanned by a sequence of models $(\mathbb{P}_{1},\mathbb{P}_{2},\ldots)\,$ . Moreover, let $\operatorname*{{\mathcal{P}}_{\!\!\varepsilon}}$ be dominated by some model $\mathbb{P}_{0}$ and assume all densities w.r.t. $\mathbb{P}_{0}$ to be bounded. For $t=1,\ldots,T$ , let $\mathcal{A}_{t}$ be acceptability functionals with corresponding superdifferentials $\mathcal{Z}_{\mathcal{A}_{t}}$ . Then, the distributionally robust acceptable ask price is given by

[TABLE]

and the distributionally robust acceptable bid price is given by

[TABLE]

Proof.

Define

[TABLE]

Then, the constraints in $(\rm{PP}^{\prime})$ can be written in the form

[TABLE]

Since all densities $f_{t}$ are bounded by assumption,777It would be sufficient to assume $\mathcal{Z}_{\mathcal{A}_{t}}\subseteq L_{s}$ and $f_{t}\in L_{r}$ such that $\frac{1}{r}+\frac{1}{s}=\frac{1}{q}$ . However, for simplicity, we keep $\mathcal{Z}_{\mathcal{A}_{t}}\subseteq L_{q}$ and assume $f_{t}\in L_{\infty}$ . Lemma 2 holds true if we replace $Z_{t}\in\mathcal{Z}_{t}$ by $\mathfrak{d}_{t}\in\mathfrak{D}_{t}$ . It can easily be seen that for each $t$ there are sequences $(\mathfrak{d}_{t,1},\mathfrak{d}_{t,2},\ldots)$ which are dense in $\mathfrak{D}_{t}$ . Let us define

[TABLE]

Then, it holds that $\mathfrak{D}_{t}^{n}\subseteq\mathfrak{D}_{t}^{n+1}$ and $\bigcup_{n}\mathfrak{D}_{t}^{n}=\mathfrak{D}_{t}$ . Thus, by Lemma 2 we may approximate $(\rm{PP})$ by a problem of the form

[TABLE]

Rearranging its Lagrangian leads to the following representation of $({\rm{PP}}_{n})\,$ :

[TABLE]

where

[TABLE]

This is a finite-dimensional bilinear problem. Notice that $({\rm{PP}}_{n})$ is always feasible.888This follows from the fact that a feasible solution $(x_{0},\ldots,x_{T-1})$ of $({\rm{PP}}_{n})$ can easily be constructed in a deterministic way, starting with $x_{T-1}\,$ . We may thus interchange the $\inf$ and the $\sup$ . Carrying out explicitly the minimization in $x$ , the unconstrained minimax problem (16) can be written as the constrained maximization problem

[TABLE]

Introducing a new probability measure $\mathbb{Q}$ defined by the Radon-Nikodým derivative $\frac{d\mathbb{Q}}{d\mathbb{\operatorname{\mathbb{P}_{0}}}}=W_{T}^{n}$ , the problem can be rewritten in terms of $\mathbb{Q}$ in the form

[TABLE]

It is left to show that there is no duality gap in the limit, as $n\rightarrow\infty\,$ . Assume that the dual problem $(\rm{DD})$ has an optimal value $\pi_{a}^{\prime}\neq\pi_{a}\,$ . By the primal constraints in $(\rm{PP})$ , for any dual feasible solution $\mathbb{Q}$ it holds

[TABLE]

Thus, the optimal primal solution $\pi_{a}$ is also greater than or equal to the optimal dual solution $\pi_{a}^{\prime}\,$ . Now assume $\pi_{a}^{\prime}<\pi_{a}\,$ . Then, since $\pi_{a}^{n}\uparrow\pi_{a}$ by Lemma 2, there must exist some $n$ such that $\pi_{a}^{n}>\pi_{a}^{\prime}\,$ . Moreover, there exists some $\mathbb{Q}^{n}$ , which is dual feasible and such that $\mathbb{E}^{\mathbb{Q}^{n}}\left[\sum_{t=1}^{T}{C}_{t}\right]=\pi_{a}^{n}\,$ . This is a contradiction to $\pi_{a}^{\prime}$ being the limit of the monotonically increasing sequence of optimal values of the approximate dual problems of the form $({\rm{DD}}_{n})$ . Hence, $\pi_{a}^{\prime}=\pi_{a}$ , i.e., it is shown that there is no duality gap in the limit.

Finally, considering the structure of $\mathfrak{D}_{t}$ , the condition ${\left.\frac{d\mathbb{Q}}{d\mathbb{\operatorname{\mathbb{P}_{0}}}}\middle|\right.}_{\mathcal{F}_{t}}\in\mathfrak{D}_{t}$ means that it is of the form $Z_{t}f_{t}$ , where there exists some $\mathbb{P}\in\operatorname*{{\mathcal{P}}_{\!\!\varepsilon}}$ such that $Z_{t}\in\mathcal{Z}_{\mathcal{A}_{t}^{\mathbb{P}}}$ and ${\left.\frac{d\mathbb{P}}{d\mathbb{\operatorname{\mathbb{P}_{0}}}}\middle|\right.}_{\mathcal{F}_{t}}=f_{t}$ . This completes the derivation of the dual problem formulation $({\rm{DD}})$ . ∎

3.2 Nested distance balls as ambiguity sets: a large deviations result

In order to find appropriate nonparametric distances for probability models used in the framework of stochastic optimization, one has to observe that a minimal requirement is that it metricizes weak convergence and allows for convergence of empirical distributions. The Kantorovich-Wasserstein distance does metricize the weak topology on the family of probability measures having a first moment. Its multistage generalization, the nested distance, measures the distance between stochastic processes on filtered probability spaces. The Appendix contains the definition and interpretation of both, the Kantorovich-Wasserstein distance and the nested distance.

Realistic probability models must be based on observed data. While for single- or vector-valued random variables with finite expectation the empirical distribution based on an i.i.d. sample converges in Kantorovich-Wasserstein distance to the underlying probability measure, the situation is more involved for stochastic processes. The simple empirical distribution for stochastic processes does not converge in nested distance (cf. Pflug and Pichler [32]), but a smoothed version involving density estimates does.

As we show here by merging the concepts of kernel estimations and transportation distances, one may get good estimates for confidence balls and ambiguity sets under some assumptions on regularity.

Let $\mathbb{P}$ be the distribution of the stochastic process $\xi=(\xi_{1},\dots,\xi_{T})$ with values $\xi_{t}\in\mathbb{R}^{m}$ . Notice that $\mathbb{P}$ is a distribution on $\mathbb{R}^{\ell}$ with $\ell=m\cdot T$ . Let $\mathbb{P}^{n}$ be the probability measure of $n$ independent samples from $\mathbb{P}$ . If $\xi^{(j)}=(\xi_{1}^{(j)},\dots,\xi_{T}^{(j)})$ , $j=1,\dots,n$ is such a sample, then the empirical distribution $\hat{\mathbb{P}}_{n}$ puts the weight $1/n$ on each of the paths $\xi^{(j)}$ . For the construction of nested ambiguity balls, the empirical distribution has to be smoothed by convolution with a kernel function $k(x)$ for $x\in\mathbb{R}^{\ell}$ . For a bandwidth $h>0$ to be specified later, let $k_{h}(x)=\frac{1}{h^{\ell}}k(x/h)$ . In what follows we will work with the kernel density estimate $\hat{f}_{n}=\hat{\mathbb{P}}_{n}*k_{h}$ , where $*$ denotes convolution.

Assumption A4.

The support of $\mathbb{P}$ is a set $D=D_{1}\times\dots\times D_{T}$ , where $D_{i}$ are compact sets in $\mathbb{R}^{m}$ ; 2. 2.

$\mathbb{P}$ has a Lebesgue density $f$ , which is Lipschitz on $D$ with constant $L$ ; 3. 3.

$f$ is bounded from below and from above on $D$ by $0<\underline{c}\leq f(x)\leq\overline{c}$ ; 4. 4.

the kernel function $k$ vanishes outside the unit ball and is Lipschitz with constant $L$ ; 5. 5.

the conditional probabilities $\mathbb{P}_{t}(A|x)=\mathbb{P}(\xi_{t}\in A|(\xi_{1},\dots,\xi_{t-1})=x)$ satisfy

[TABLE]

for some $\gamma_{t}>0$ . Here, $\mathsf{d}$ denotes the Wasserstein distance for probabilities on $\mathbb{R}^{m}$ .

Remark 2.

The proof of Theorem 3 below relies on the lower bound $\underline{c}$ of the density. As the denominator of the conditional density $f(x|y)=f(x,y)/f(y)$ has to be estimated by density estimation as well, the bound ensures that the denominator does not vanish. In fact, the assumptions on the compact cube (point 1.) can be weakened to D being a compact set; the proof, however, is slightly more involved then. For the other technical assumptions (under point 5.) we may refer to Mirkov and Pflug [22].

Theorem 3 (Large deviation for the nested distance).

Under Assumption A4 there exists a constant $K>0$ such that

[TABLE]

for $n$ sufficiently large and appropriately chosen bandwidth $h$ . Here, $\operatorname{\mathsf{d\kern-0.6458ptI}}$ denotes the nested distance.

The proof of (18) is based on several steps presented as propositions below. To start with we recall two important results for density estimates $\hat{f}_{n}=\hat{\mathbb{P}}_{n}*k_{h}$ for densities $f$ on $\mathbb{R}^{\ell}$ .

Proposition 3.

Under the Lipschitz conditions for $f$ and $k$ given above, it holds that

[TABLE]

if the bandwidth is chosen as $h=\varepsilon/(2L)$ .

Proof.

See Bolley et al. [2, Prop. 3.1]. ∎

Proposition 4.

Let $f$ and $g$ be densities vanishing outside a compact set $D$ and set $\mathbb{P}^{f}(A)=\int_{A}f(x)\mathrm{d}x$ resp. $\mathbb{P}^{g}(A)=\int_{A}g(x)\mathrm{d}x\,$ . Then their Wasserstein distance $\mathsf{d}$ is bounded by

[TABLE]

Here $\Delta$ is the diameter of $D$ and $\lambda(D)$ is the Lebesgue measure of $D$ .

Proof.

Cf. [32, Prop. 4]. ∎

The next result extends the previous for conditional densities.

Proposition 5.

Let $f$ and $g$ be bivariate densities on compact sets $\bar{D}_{1}\times\bar{D}_{2}$ bounded by $0<\underline{c}\leq f,g\leq\overline{c}<\infty$ which are sufficiently close so that $\left\|f-g\right\|_{\bar{D}_{1}\times\bar{D}_{2}}\leq\underline{c}\lambda(\bar{D}_{1}\times\bar{D}_{2})[2\Delta^{\ell}]^{-1}\,$ . Then there is a universal constant $\kappa_{1}$ , depending on the set $\bar{D}:=\bar{D}_{1}\times\bar{D}_{2}$ only, so that the conditional densities are close as well, i.e., they satisfy

[TABLE]

for all $x\in\bar{D}_{1}$ and $y\in\bar{D}_{2}$ , i.e.,

[TABLE]

Proof.

To abbreviate the notation set $\varepsilon:=\sup_{x,y}\left|f(x,y)-g(x,y)\right|$ and note that $\varepsilon\leq\underline{c}\lambda(\bar{D})[2\Delta^{\ell}]^{-1}\,$ . Consider the marginal density $f(y):=\int_{\bar{D}_{1}}f(x,y)\mathrm{d}x$ ( $g(y):=\int_{\bar{D}_{1}}g(x,y)\mathrm{d}x$ , resp.). It holds that

[TABLE]

Clearly $|f(y)|\geq\underline{c}\lambda(\bar{D}_{1})$ , where $\lambda(\bar{D}_{1})$ is the Lebesgue measure of $\bar{D}_{1}$ and therefore

[TABLE]

The elementary inequality $\frac{1}{1+x}\leq 1+2\left|x\right|$ is valid for $x\geq-\nicefrac{{1}}{{2}}$ . With (22) it follows that

[TABLE]

with $\kappa_{1}=\frac{1}{\underline{c}\lambda(\bar{D}_{1})}+\frac{2\overline{c}\Delta^{\ell}}{(\underline{c}\lambda(\bar{D}_{1}))^{2}}$ . The assertion of the proposition finally follows by exchanging the roles of the densities $f$ and $g$ . ∎

Theorem 4.

Given Assumption A4 there exists a constant $\kappa_{2}$ such that

[TABLE]

for all $\varepsilon>0$ and $n$ sufficiently large.

Proof.

It follows from (20) and (21) that

[TABLE]

for $\kappa_{3}=2\Delta\lambda(D)\kappa_{1}$ . Recall the large deviation result from [2, Th. 2.8], which is given by

[TABLE]

for some universal constant $\kappa^{\prime}$ depending on the Lipschitz constants of $f$ and $k$ only.

With (19) it follows that

[TABLE]

Setting $\kappa_{2}:=\kappa^{\prime}(2L\kappa_{3})^{-2\ell-4}$ in (23) reveals the result. ∎

Theorem 3.

The previous theorem will be applied to the conditional densities of $\xi_{t}$ given the past $\xi_{1},\dots,\xi_{t-1}$ . Thus the sets $\bar{D}_{i}$ are interpreted as $\bar{D}_{1}=D_{t}$ and $\bar{D}_{2}=D_{1}\times\dots\times D_{t-1}$ . For the probability measure $\mathbb{P}$ satisfying (17) and any other measure $\tilde{\mathbb{P}}$ satisfying $\mathsf{d}\left(\mathbb{P}_{t}\left(\cdot|x\right),\tilde{\mathbb{P}}_{t}\left(\cdot|x\right)\right)\leq\varepsilon_{t}$ at stage $t$ we have that

[TABLE]

see [31, Sec. 4.2] or [22].

We employ the results elaborated above for $\tilde{\mathbb{P}}:=\hat{\mathbb{P}}_{n}*k_{h}$ . Then

[TABLE]

We employ (23) to deduce that

[TABLE]

with $\varepsilon_{t}:=\varepsilon[T\gamma_{t}\prod_{s=t+1}^{T}(1+\gamma_{s})]^{-1}$ .

The desired large deviation result follows for $n$ sufficiently large for any $K<\min_{t\in\left\{1,\dots,T\right\}}\kappa_{2}\left[\left(T\gamma_{t}\prod_{s=t+1}^{T}(1+\gamma_{s})\right)^{2\ell+4}\right]^{-1}$ . ∎

The smoothed model $\hat{\mathbb{P}}_{n}*k_{h}$ is not yet a tree, but by Theorem 6 of the Appendix one may find999See [31, Chap. 4] for methods to efficiently construct multistage models/ scenario trees from data. a finite tree process $\bar{\mathbb{P}}_{n}$ , which is arbitrarily close to it. Therefore, by eventually increasing the probability bound in (18) by another constant factor, it holds true also for $\bar{\mathbb{P}}_{n}\,$ .

Remark 3.

From a statistical perspective, the results contained in this section represent a strong motivation to use nested distance balls as ambiguity sets for general stochastic optimization problems on scenario trees constructed from observed data. In particular, the distributionally robust acceptable ask price allows the seller of a claim to invest in a trading strategy which gives an acceptable superhedge of the payments to be made under the true model with arbitrary high probability, given sufficient available data.

4 Illustrative examples

One may summarize the results of the previous sections in the following way: If the martingale measure is not unique (‘incomplete market’), then typically there is a positive bid-ask spread in the (pointwise) replication model. This spread does also exist in the acceptability model. However, if the acceptability functional is the $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ , then by changing $\alpha$ we can get the complete range between the replication model ( $\alpha\rightarrow 0)$ and the expectation model ( $\alpha=1)$ . At least in the latter case, but possibly even for some $\alpha<1\,$ , there is no bid-ask spread and thus a unique price. On the other hand, model ambiguity widens the bid-ask spread: The more models are considered, i.e., the larger the radius of the ambiguity set, the wider is the bid-ask spread. For illustrative purposes, let us look at the simplest form of examples which demonstrate these effects.

Example 1.

Consider a three-stage ternary tree, where the paths are uniformly distributed and given by the columns of the matrix

[TABLE]

Since infinitely many equivalent martingale measures can be constructed on this tree, there is a considerable bid-ask spread for the pointwise replication model, which corresponds to the $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ -acceptability pricing model with $\alpha=0$ . However, by increasing $\alpha$ for both contract sides, the bid-ask spread gets monotonically smaller. For $\alpha=1$ , there is no bid-ask spread, since all martingale measures coincide in their expectation and both buyer and seller only consider expectation in their valuation. Figure 1a visualizes this behavior for the price of a call option struck at $95\%$ : the bid price increases with $\alpha$ , while the ask price decreases. For $\alpha=1$ they coincide.

Computationally, $\operatorname{\mathbb{A}\mathbb{V}@R}$ –acceptability pricing on scenario trees boils down to solving a linear program (LP). It is thus straightforward to implement and the problem scales with the complexity of LPs.

Example 2.

In contrast, one may consider a three-stage binary tree model with uniformly distributed scenarios given by the columns of the matrix

[TABLE]

This tree can carry only one single martingale measure. In such a model, the change of acceptability levels does not change the price, since also under weakened acceptability the price is determined by a martingale measure, namely the unique one (in case $\alpha$ is small enough such that it is feasible). However, in an ambiguity situation, a bid-ask spread may appear, since there are typically many martingale measures contained in ambiguity sets. We consider nested distance balls around the baseline tree, where we keep the uniform distribution of the scenarios for simplicity, but allow the values of the process to change.101010This is a non-convex problem. The results in Figure 1b are based on the standard nonlinear solver of a commercial software package (MATLAB 8.5 (R2015a), The MathWorks Inc., Natick, MA, 2015.), which finds (local) optima for our small instance of a problem. The result for a call option struck at $95\%$ can be seen in Figure 1b. While there is a unique price for small radii $\varepsilon$ of the nested distance ball, an increasing bid-ask spread appears for larger values of $\varepsilon$ .

5 Algorithmic solution

The nested distance between two given scenario trees can be obtained by solving an LP. However, the distributionally robust $\operatorname{\mathbb{A}\mathbb{V}@R}$ –acceptability pricing problem w.r.t. nested distance balls as ambiguity sets results in a highly non-linear, in general non-convex problem. Therefore, we assume the tree structure to be given by the baseline model. In particular, it is assumed that different probability models within the ambiguity set differ only in terms of the transition probabilities; state values and the information structure are kept fixed.

Still, distributionally robust acceptability pricing is a semi-infinite non-convex problem. The only algorithmic approach available in the literature for similar problems is based on the idea of successive programming (cf. [31, Chap. 7.3.3]): an approximate solution is computed by starting with the baseline model only and alternately adding worst case models and finding optimal solutions. However, for typical instances of tree models this is computationally hard, as it involves the solution of a non-convex problem in each iteration step.

Hence, we tackle the dual formulation presented in Theorem 2. The structure of the nested distance enables an iterative approach. Algorithm 5 finds an approximate solution by solving a sequence of linear programs. Based on duality considerations and algorithmic exploitation of the specific stagewise transportation structure inherent to the nested distance, the algorithm approximates the solution of a semi-infinite non-convex problem by a sequence of LPs. The current state-of-the-art method, on the other hand, requires the solution of a non-convex program in each iteration step. Clearly, a sequential linear programming approach improves the performance considerably.111111For our implementations, the speed-up factor for a test problem was on average about 100. However, this may depend heavily on the implementation and the problem. Moreover, our algorithm turned out to find feasible solutions in many cases where our implementation of a successive programming method fails to do so.

Let us extend the concept of the nested distance to subtrees, iteratively from the leaves to the root (’top-down’). For two scenario trees (here with identical filtration structures), define $\operatorname{\mathsf{d\kern-0.6458ptI}}_{T}(i,j)$ as the distance of the paths leading to the leave nodes $i,j\in\mathcal{N}_{T}$ . Moreover, define

[TABLE]

for all nodes $k,l\in\mathcal{N}_{t}$ , where $0\leq t<T\,$ . Then, the nested distance between the two trees is given by $\operatorname{\mathsf{d\kern-0.6458ptI}}_{0}(1,1)\,$ . This stagewise backwards approach (cf. [31, Alg. 2.1]) is the basic idea of Algorithm 5. As we assume the tree structure to be fixed, Algorithm 5 iterates through the tree in the same top-down manner and searches for the optimal solution in each stage, while ensuring that the nested distance constraint remains satisfied. The variables are the conditional transition probabilities under $\mathbb{Q}$ , i.e., $q_{i}:=\mathbb{Q}[i|i-]$ , as well as the transportation subplans $\pi(i,j|i-,j-)$ , as defined in the Appendix. We use the notation $n-$ for the immediate predecessor of some node $n$ . As the measure $\mathbb{P}$ is in fact not needed explicitly since it is given by the transportation plan from $\hat{\mathbb{P}}\,$ , condition (4.3) in Algorithm 5 serves to ensure that it is still well-defined implicitly (note that always some node $\tilde{k}\in\mathcal{N}_{t-1}$ needs to be fixed). Condition (1) ensures that $\mathbb{Q}$ is a martingale measure, $\mathbb{Q}$ represents conditional probabilities by condition (2), condition (3) corresponds to the constraint on the measure change ( $d\mathbb{Q}/d\mathbb{P}\leq 1/\alpha$ ) resulting from the primal $\operatorname{\mathbb{A}\mathbb{V}@R}_{\alpha}$ –acceptability conditions, and (4.1) – (4.3) represent the constraint that there must be one $\mathbb{P}$ contained in the nested distance ball such that condition (3) holds.

The algorithm optimizes the variables stagewise top-down. The optimal solution at stage $t+1$ depends on the values of the variables for all stages up to stage $t$ , which result from the previous iteration step. Therefore, the algorithm iterates as long as there is further improvement possible at some stage, given updated variable values for the earlier stages of the tree. Otherwise, it terminates and the optimal solution of our approximate problem is found.

Algorithm 1 Acceptability pricing under model ambiguity.

Start with some feasible model $\mathbb{P}$ in the nested distance ball around $\hat{\mathbb{P}}$ . Initialize $\pi_{\textnormal{old}}$ by assigning the optimal transportation plan between $\mathbb{P}$ and $\hat{\mathbb{P}}$ and initialize ’oldprice’.

1:Iteration

2: [newprice, $\pi_{\textnormal{new}}$ ] $\leftarrow\textsc{\ref{func: getprice}}(\pi_{\textnormal{old}})$

3: if (oldprice == newprice) then

4: return oldprice

5: else

6: oldprice $\leftarrow$ newprice, $\pi_{\textnormal{old}}\leftarrow\pi_{\textnormal{new}}$

7: Iterate

8: end if

9:EndIteration

10:

11:function GetPrice( $\tilde{\pi}$ )

12: for $t$ from $T$ to $1$ do solve

[TABLE]

13: end for

14: price $\leftarrow\mathbb{E}^{\mathbb{Q}}[\sum_{t=1}^{T}C_{t}]$ , construct transportation plan $\pi(\cdot,\cdot)$ from subplans $\pi(\cdot,\cdot|\cdot,\cdot)$

15: return [price, $\pi$ ]

16:end function

Example 3.

Consider the price of a plain vanilla call option struck at 95, in the Black-Scholes model with parameters $S_{0}=100,r=0.01,\sigma=0.2,T=1$ . Applying optimal quantization techniques (see, e.g., [31, Chap. 4] for an overview) to discretize the lognormal distribution, we construct a scenario tree with 500 nodes. While there exists a unique martingale measure (and thus a unique option price) in the Black-Scholes model, the discrete approximation allows for several martingale measures (and thus a positive bid-ask spread). Figure 2 visualizes the bid-ask spread as a function of the $\operatorname{\mathbb{A}\mathbb{V}@R}$ –acceptability level $\alpha$ and the radius $\varepsilon$ of the nested distance ball used as model ambiguity set. For $\alpha\rightarrow 1$ and $\varepsilon=0$ , the spread closes and the resulting price approximates the true Black-Scholes price up to 4 digits. For illustrative purposes, the spread between the bid and the ask price surface is shown from two perspectives.

6 Conclusion

In this paper we extended the usual methods for contingent claim pricing into two directions. First, we replaced the replication constraint by a more realistic acceptability constraint. By doing so, the claim price does explicitly depend on the stochastic model for the price dynamics of the underlying (and not just on its null sets). If the model is based on observed data, then the calculation of the claim price can be seen as a statistical estimate. Therefore, as a second extension, we introduced model ambiguity into the acceptability pricing framework and we derived the dual problem formulations in the extended setting. Moreover, we used the nested distance for stochastic processes to define a confidence set for the underlying price model. In this way, we link acceptability prices of a claim to the quality of observed data. In particular, the size of the confidence region decreases with the sample size, i.e., the number of observed independent paths of the stochastic process of the underlying. For a given sample of observations, the ambiguity radius indicates how much the baseline ask/ bid price should be corrected to safeguard the seller/ buyer of a claim against the inherent statistical model risk, as Section 5 illustrates.

Appendix

Distances for random variables and stochastic processes. Recall the definition of the Kantorovich-Wasserstein distance $\mathsf{d}(P,\tilde{P})$ for two (Borel) random distributions $P$ and $\tilde{P}$ on $\mathbb{R}^{m}$ :

[TABLE]

Here, $\pi$ runs over all Borel measures on $\mathbb{R}^{m}\times\mathbb{R}^{m}$ with given marginals $P$ resp. $\tilde{P}$ . These measures are called transportation plans. If $\xi$ and $\tilde{\xi}$ are $\mathbb{R}^{m}$ -valued random variables, then their distance is defined as the distance of the corresponding image measures $P^{\xi}$ resp. $P^{\tilde{\xi}}$ .

Pflug and Pichler [29, 30] introduced the notion of the nested distance as a generalization of the Kantorovich-Wasserstein distance for $\mathbb{R}^{m}$ -valued stochastic processes $\xi=(\xi_{1},\dots,\xi_{T})$ and its image measures $\mathbb{P}$ on $\mathbb{R}^{mT}$ . Let $\mathcal{F}=(\mathcal{F}_{1},\dots,\mathcal{F}_{T})$ be the filtration composed of the sigma-algebras $\mathcal{F}_{t}$ generated by the component projections $(\xi_{1},\dots,\xi_{T})\mapsto(\xi_{1},\dots,\xi_{t})\,$ . Moreover, let for $\xi=(\xi_{1},\dots,\xi_{T})\in\mathbb{R}^{mT}$ the distance be defined as $\|\xi-\tilde{\xi}\|:=\sum_{t=1}^{T}\|\xi_{t}-\tilde{\xi}_{t}\|$ .

Definition 3.

The nested distance $\operatorname{\mathsf{d\kern-0.6458ptI}}$ for distributions $\mathbb{P}$ and $\tilde{\mathbb{P}}$ is defined as

[TABLE]

To interpret this definition, the nested distance between two multistage probability distributions is obtained by minimizing over all transportation plans $\pi$ , which are compatible with the filtration structures. For a single period (i.e., $T=1$ ), the nested distance coincides with the Kantorovich-Wasserstein distance. The following basic theorem for stability of multistage stochastic optimization problems was proved by Pflug and Pichler [30, Th. 6.1].

Theorem 5.

Let $\mathbb{P}$ and $\tilde{\mathbb{P}}$ be nested distributions with filtrations $\mathcal{F}$ and $\tilde{\mathcal{F}}$ , respectively. Consider the multistage stochastic optimization problem

[TABLE]

where $Q$ is convex in the decisions $x=(x_{1},\dots,x_{T})$ for any $\xi$ fixed, and Lipschitz with constant $L$ in the scenario process $\xi=(\xi_{1},\dots,\xi_{T})$ for any $x$ fixed. The set $\mathbb{X}$ is assumed to be convex and the constraint $x\lhd\mathcal{F}$ means that the decisions can be random variables, but must be adapted to the filtration $\mathcal{F}$ , i.e., must be nonanticipative. Then the objective values $v(\mathbb{P})$ and $v(\tilde{\mathbb{P}})$ satisfy

[TABLE]

Finite scenario trees are much easier to work with than general stochastic processes. For finite trees, where every node $m$ has a unique predecessor, we write $m+$ for the set of its immediate successors. Denote by $\mathcal{N}_{t}$ the set of all nodes at stage $t$ of the tree model $\mathbb{P}$ . For a node $i\in m+$ let $\mathbb{P}[i|m]$ be the conditional transition probability from $m$ to $i\,$ .

Definition 4.

The nested distance for scenario trees $\mathbb{P}$ and $\mathbb{\tilde{P}}$ is defined as

[TABLE]

The matrix $\pi$ of transportation plans and the matrix $D$ carrying the pairwise distances of the paths are defined on $\mathcal{N}_{T}\times\tilde{\mathcal{N}}_{T}$ . The conditional joint probabilities $\pi(i,j|k,l)$ in (24) are given by $\pi(i,j|k,l)=\pi_{i,j}\cdot[\sum\limits_{i^{\prime}\in k+}\sum\limits_{j^{\prime}\in l+}\pi_{i^{\prime},j^{\prime}}]^{-1}\,.$

Approximation of random processes by finite trees. The subsequent result follows from [31, Prop. 4.26].

Theorem 6.

If the stochastic process $\xi=(\xi_{1},\dots,\xi_{T})$ satisfies the Lipschitz condition given in Assumption A4.5 in Section 3.2, then for every $\varepsilon>0$ there is a stochastic process with distribution $\tilde{\mathbb{P}}$ , which is defined on a finite tree and which satisfies

[TABLE]

where $\mathbb{P}$ is the distribution of $\xi$ on the filtered space $(\Omega,\mathcal{F})$ .

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bita Analui and Georg Ch. Pflug. On distributionally robust multiperiod stochastic optimization. Computational Management Science , 11(3):197–220, 2014.
2[2] François Bolley, Arnaud Guillin, and Cédric Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probability Theory and Related Fields , 137(3-4):541–593, 2007.
3[3] Peter Carr, Hélyette Geman, and Dilip B. Madan. Pricing and hedging in incomplete markets. Journal of Financial Economics , 62(1):131–167, October 2001.
4[4] Rama Cont and Peter Tankov. Financial Modelling With Jump Processes . Chapman & Hall/CRC, 2004.
5[5] Kristina Rognlien Dahl. A convex duality approach for pricing contingent claims under partial information and short selling constraints. Stochastic Analysis and Applications , 35(2):317–333, 2017.
6[6] Freddy Delbaen and Walter Schachermayer. A General Version of the Fundamental Theorem of Asset Pricing. Mathematische Annalen , 300(3):463–520, 1994.
7[7] Chao Duan, Wanliang Fang, Lin Jiang, Li Yao, and Jun Liu. Distributionally robust chance-constrained approximate AC-OPF with Wasserstein metric. IEEE Transactions on Power Systems , 33(5):4924 – 4936, 02 2018.
8[8] Hans Föllmer and Peter Leukert. Quantile hedging. Finance and Stochastics , 3(3):251–273, 1999.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Incorporating statistical model error into the calculation of acceptability prices of contingent claims

Abstract

1 Introduction

2 Acceptability pricing

2.1 Acceptability functionals

Proposition 1**.**

2.2 Acceptable replications

Definition 1**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Theorem 1**.**

Proof.

Remark 1** (Interpretation of the dual formulations).**

Proposition 2**.**

Proof.

3 Model ambiguity and distributional robustness

3.1 Acceptability pricing under model ambiguity

Definition 2**.**

Theorem 2**.**

Proof.

3.2 Nested distance balls as ambiguity sets: a large deviations result

Remark 2**.**

Theorem 3** (Large deviation for the nested distance).**

Proposition 3**.**

Proof.

Proposition 4**.**

Proof.

Proposition 5**.**

Proof.

Theorem 4**.**

Proof.

Theorem 3.

Remark 3**.**

4 Illustrative examples

Example 1**.**

Example 2**.**

5 Algorithmic solution

Example 3**.**

6 Conclusion

Appendix

Definition 3**.**

Theorem 5**.**

Definition 4**.**

Theorem 6**.**

Proposition 1.

Definition 1.

Lemma 1.

Lemma 2.

Theorem 1.

Remark 1 (Interpretation of the dual formulations).

Proposition 2.

Definition 2.

Theorem 2.

Remark 2.

Theorem 3 (Large deviation for the nested distance).

Proposition 3.

Proposition 4.

Proposition 5.

Theorem 4.

Remark 3.

Example 1.

Example 2.

Example 3.

Definition 3.

Theorem 5.

Definition 4.

Theorem 6.