Evaluating Range Value at Risk Forecasts

Tobias Fissler; Johanna F. Ziegel

arXiv:1902.04489·math.ST·June 27, 2022

Evaluating Range Value at Risk Forecasts

Tobias Fissler, Johanna F. Ziegel

PDF

Open Access

TL;DR

This paper investigates the statistical validation and backtesting of Range Value at Risk (RVaR) forecasts, proposing a new elicitable triplet model with two VaR components and characterizing its scoring functions.

Contribution

It introduces a triplet of RVaR with two VaR levels as an elicitable model and characterizes its strictly consistent scoring functions, advancing the validation of RVaR forecasts.

Findings

01

RVaR alone is not elicitable, but the triplet with two VaR levels is.

02

The class of strictly consistent scoring functions for the triplet is characterized.

03

Simulation studies illustrate the proposed approach.

Abstract

The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) -- a quantile -- and Expected Shortfall (ES) -- a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term 'backtesting' in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that…

Tables2

Table 1. Table 1 : Examples of scoring functions. In all cases we choose g 1 ( x 1 ) = x 1 subscript 𝑔 1 subscript 𝑥 1 subscript 𝑥 1 g_{1}(x_{1})=x_{1} and g 2 ( x 2 ) = x 2 subscript 𝑔 2 subscript 𝑥 2 subscript 𝑥 2 g_{2}(x_{2})=x_{2} . The parameters c 1 , c 2 ∈ ℝ subscript 𝑐 1 subscript 𝑐 2 ℝ c_{1},c_{2}\in\mathbb{R} satisfy c 1 < c 2 subscript 𝑐 1 subscript 𝑐 2 c_{1}<c_{2} .

Scoring function	$ϕ^{'} (x_{3})$
$S_{1}$	$(β - α) \tanh ((β - α) x_{3})$
$S_{2}$	$(β - α) (2 / π) \arctan ((β - α) x_{3})$
$S_{3}$	$(β - α) (2 Φ ((β - α) x_{3}) - 1)$
$S_{4}$	$(β - α) (- 𝟙 {x_{3} < c_{1}} + 𝟙 {x_{3} > c_{2}}$
	$+ 𝟙 {c_{1} \leq x_{3} \leq c_{2}} 2 (x_{3} - (c_{1} + c_{2}) / 2) / (c_{2} - c_{1}))$

Table 2. Table 2 : Power of Diebold-Mariano tests at significance level 0.05 0.05 0.05 for the scoring functions in Table 1 in the case that α = 1 − β = 0.1 𝛼 1 𝛽 0.1 \alpha=1-\beta=0.1 (left panel), and α = 0.01 𝛼 0.01 \alpha=0.01 , β = 0.05 𝛽 0.05 \beta=0.05 (right panel). In the first case we chose − c 1 = c 2 = 12 subscript 𝑐 1 subscript 𝑐 2 12 -c_{1}=c_{2}=12 for the scoring function S 4 subscript 𝑆 4 S_{4} , and c 1 = − 5 subscript 𝑐 1 5 c_{1}=-5 , c 2 = 1 subscript 𝑐 2 1 c_{2}=1 in the second case. The null hypothesis f ⪯ g precedes-or-equals 𝑓 𝑔 f\preceq g means that 𝔼 [ S ( f t , Y t ) ] ≤ 𝔼 [ S ( g t , Y t ) ] 𝔼 delimited-[] 𝑆 subscript 𝑓 𝑡 subscript 𝑌 𝑡 𝔼 delimited-[] 𝑆 subscript 𝑔 𝑡 subscript 𝑌 𝑡 \mathbb{E}[S(f_{t},Y_{t})]\leq\mathbb{E}[S(g_{t},Y_{t})] for all t = 1 , … , N 𝑡 1 … 𝑁 t=1,\ldots,N for the scoring function specified in the column label. We chose σ 2 = 0.5 2 superscript 𝜎 2 superscript 0.5 2 \sigma^{2}=0.5^{2} for the forecaster g 𝑔 g .

$H_{0}$	$S_{1}$	$S_{2}$	$S_{3}$	$S_{4}$
$f ⪯ g$	0	0	0	0
$g ⪯ f$	0.304	0.406	0.417	0.624
$f ⪯ h$	0	0	0	0
$h ⪯ f$	1.000	1.000	1.000	1.000
$g ⪯ h$	0	0	0	0
$h ⪯ g$	0.999	0.998	0.992	0.998

Equations99

\operatorname{RVaR}_{\alpha,\beta}=\big{(}\beta\operatorname{ES}_{\beta}-\alpha\operatorname{ES}_{\alpha}\big{)}/(\beta-\alpha),\qquad 0<\alpha<\beta<1,

\operatorname{RVaR}_{\alpha,\beta}=\big{(}\beta\operatorname{ES}_{\beta}-\alpha\operatorname{ES}_{\alpha}\big{)}/(\beta-\alpha),\qquad 0<\alpha<\beta<1,

F^{α}

F^{α}

RVaR_{α, β} (F) = ⎩ ⎨ ⎧ \frac{1}{β - α} \int_{α}^{β} VaR_{γ} (F) d γ, VaR_{α} (F), if α < β, if α = β .

RVaR_{α, β} (F) = ⎩ ⎨ ⎧ \frac{1}{β - α} \int_{α}^{β} VaR_{γ} (F) d γ, VaR_{α} (F), if α < β, if α = β .

VaR_{α} (F) \leq RVaR_{α, β} (F) \leq VaR_{β} (F) .

VaR_{α} (F) \leq RVaR_{α, β} (F) \leq VaR_{β} (F) .

RVaR_{α, β} (F)

RVaR_{α, β} (F)

\displaystyle+\operatorname{VaR}_{\alpha}(F)\big{(}F(\operatorname{VaR}_{\alpha}(F))-\alpha\big{)}-\operatorname{VaR}_{\beta}(F)\big{(}F(\operatorname{VaR}_{\beta}(F))-\beta\big{)}\Bigg{)},

W_{α} (F) := (1 - 2 α) RVaR_{α, 1 - α} (F) + α VaR_{α} (F) + α VaR_{1 - α} (F), α \in (0, 1/2) .

W_{α} (F) := (1 - 2 α) RVaR_{α, 1 - α} (F) + α VaR_{α} (F) + α VaR_{1 - α} (F), α \in (0, 1/2) .

\displaystyle V(x_{1},x_{2},x_{3},y)\ =\begin{pmatrix}\mathds{1}\{y\leq x_{1}\}-\alpha\\ \mathds{1}\{y\leq x_{2}\}-\beta\\ x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\end{pmatrix}

\displaystyle V(x_{1},x_{2},x_{3},y)\ =\begin{pmatrix}\mathds{1}\{y\leq x_{1}\}-\alpha\\ \mathds{1}\{y\leq x_{2}\}-\beta\\ x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\end{pmatrix}

\overset{ˉ}{V}_{3} (VaR_{α} (F), VaR_{β} (F), x_{3}, F) = x_{3} - RVaR_{α, β} (F),

\overset{ˉ}{V}_{3} (VaR_{α} (F), VaR_{β} (F), x_{3}, F) = x_{3} - RVaR_{α, β} (F),

S (x_{1}, x_{2}, x_{3}, y)

S (x_{1}, x_{2}, x_{3}, y)

\displaystyle+\big{(}\mathds{1}\{y\leq x_{2}\}-\beta\big{)}g_{2}(x_{2})-\mathds{1}\{y\leq x_{2}\}g_{2}(y)

\displaystyle+\phi^{\prime}(x_{3})\Big{(}x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\Big{)}-\phi(x_{3})+a(y),

G_{1, x_{3}} : [c_{m i n}, c_{m a x}] \to R, x_{1} \mapsto g_{1} (x_{1}) - x_{1} ϕ^{'} (x_{3}) / (β - α),

G_{1, x_{3}} : [c_{m i n}, c_{m a x}] \to R, x_{1} \mapsto g_{1} (x_{1}) - x_{1} ϕ^{'} (x_{3}) / (β - α),

G_{2, x_{3}} : [c_{m i n}, c_{m a x}] \to R, x_{2} \mapsto g_{2} (x_{2}) + x_{2} ϕ^{'} (x_{3}) / (β - α)

0

0

= \overset{ˉ}{S} (x_{1}, x_{2}, x_{3}, F) - \overset{ˉ}{S} (t_{1}, t_{2}, x_{3}, F)

\overset{ˉ}{S} (t_{1}, t_{2}, x_{3}, F) - \overset{ˉ}{S} (t_{1}, t_{2}, t_{3}, F) = ϕ^{'} (x_{3}) (x_{3} - t_{3}) - ϕ (x_{3}) + ϕ (t_{3}) \geq 0,

\overset{ˉ}{S} (t_{1}, t_{2}, x_{3}, F) - \overset{ˉ}{S} (t_{1}, t_{2}, t_{3}, F) = ϕ^{'} (x_{3}) (x_{3} - t_{3}) - ϕ (x_{3}) + ϕ (t_{3}) \geq 0,

x \in A_{0} arg min \overset{ˉ}{S} (x, F) = q_{α} (F) \times q_{β} (F) \times {RVaR_{α, β} (F)} .

x \in A_{0} arg min \overset{ˉ}{S} (x, F) = q_{α} (F) \times q_{β} (F) \times {RVaR_{α, β} (F)} .

- \infty < - \frac{g _{2} ( x _{2}^{'} ) - g _{1} ( x _{2} )}{x _{2}^{'} - x _{2}} \leq \frac{ϕ ^{'} ( x _{3} )}{β - α} \leq \frac{g _{1} ( x _{1}^{'} ) - g _{1} ( x _{1} )}{x _{1}^{'} - x _{1}} < \infty.

- \infty < - \frac{g _{2} ( x _{2}^{'} ) - g _{1} ( x _{2} )}{x _{2}^{'} - x _{2}} \leq \frac{ϕ ^{'} ( x _{3} )}{β - α} \leq \frac{g _{1} ( x _{1}^{'} ) - g _{1} ( x _{1} )}{x _{1}^{'} - x _{1}} < \infty.

\overset{ˉ}{V}_{3} (x, F) = x_{3} + \frac{1}{β - α} (x_{2} (F (x_{2}) - β) - x_{1} (F (x_{1}) - α) - \int_{x_{1}}^{x_{2}} y f (y) d y)

\overset{ˉ}{V}_{3} (x, F) = x_{3} + \frac{1}{β - α} (x_{2} (F (x_{2}) - β) - x_{1} (F (x_{1}) - α) - \int_{x_{1}}^{x_{2}} y f (y) d y)

\partial_{1} \partial_{2} \overset{ˉ}{S} (x, F) = \partial_{1} h_{22} (x) (F (x_{2}) - β) = \partial_{2} h_{11} (x) (F (x_{1}) - α) = \partial_{2} \partial_{1} \overset{ˉ}{S} (x, F)

\partial_{1} \partial_{2} \overset{ˉ}{S} (x, F) = \partial_{1} h_{22} (x) (F (x_{2}) - β) = \partial_{2} h_{11} (x) (F (x_{1}) - α) = \partial_{2} \partial_{1} \overset{ˉ}{S} (x, F)

h_{11} (x)

h_{11} (x)

h_{22} (x)

0

0

\leq g_{1} (x_{1}^{'}) - g_{1} (x_{1}) - (x_{1}^{'} - x_{1}) ϕ^{'} (x_{3}) / (β - α)

\operatorname{RVaR}_{\alpha,\beta}(F)=-\frac{1}{\beta-\alpha}\big{(}\bar{S}_{\beta}(\operatorname{VaR}_{\beta}(F),F)-\bar{S}_{\alpha}(\operatorname{VaR}_{\alpha}(F),F)\big{)}\,.

\operatorname{RVaR}_{\alpha,\beta}(F)=-\frac{1}{\beta-\alpha}\big{(}\bar{S}_{\beta}(\operatorname{VaR}_{\beta}(F),F)-\bar{S}_{\alpha}(\operatorname{VaR}_{\alpha}(F),F)\big{)}\,.

Ψ (z, x, x^{'}, y)

Ψ (z, x, x^{'}, y)

- S (x_{1}, x_{2}, x_{3}, y) + S (x_{1}^{'}, x_{2}^{'}, x_{3}^{'}, y)

0=\frac{\mathrm{d}}{\mathrm{d}x_{3}}\Psi(z,x,x^{\prime},y)=\big{(}\phi^{\prime\prime}(x_{3}+z)-\phi^{\prime\prime}(x_{3})\big{)}\Big{(}x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\Big{)}\,.

0=\frac{\mathrm{d}}{\mathrm{d}x_{3}}\Psi(z,x,x^{\prime},y)=\big{(}\phi^{\prime\prime}(x_{3}+z)-\phi^{\prime\prime}(x_{3})\big{)}\Big{(}x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\Big{)}\,.

Ψ (c, x, x^{'}, y) = S (c x, cy) - S (c x^{'}, cy) - c^{b} S (x, y) + c^{b} S (x^{'}, y)

Ψ (c, x, x^{'}, y) = S (c x, cy) - S (c x^{'}, cy) - c^{b} S (x, y) + c^{b} S (x^{'}, y)

0=\frac{\mathrm{d}}{\mathrm{d}x_{3}}\Psi(z,x,x^{\prime},y)=\big{(}c^{2}\phi^{\prime\prime}(cx_{3})-c^{b}\phi^{\prime\prime}(x_{3})\big{)}\Big{(}x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\Big{)}\,.

0=\frac{\mathrm{d}}{\mathrm{d}x_{3}}\Psi(z,x,x^{\prime},y)=\big{(}c^{2}\phi^{\prime\prime}(cx_{3})-c^{b}\phi^{\prime\prime}(x_{3})\big{)}\Big{(}x_{3}+\frac{1}{\beta-\alpha}\big{(}S_{\beta}(x_{2},y)-S_{\alpha}(x_{1},y)\big{)}\Big{)}\,.

S (x_{1}, x_{2}, x_{3}, y) = \int L_{v}^{1} (x_{1}, y) d H_{1} (v) + \int L_{v}^{2} (x_{2}, y) d H_{2} (v) + \int L_{v}^{3} (x_{1}, x_{2}, x_{3}, y) d H_{3} (v),

S (x_{1}, x_{2}, x_{3}, y) = \int L_{v}^{1} (x_{1}, y) d H_{1} (v) + \int L_{v}^{2} (x_{2}, y) d H_{2} (v) + \int L_{v}^{3} (x_{1}, x_{2}, x_{3}, y) d H_{3} (v),

L_{v}^{1} (x_{1}, y)

L_{v}^{1} (x_{1}, y)

L_{v}^{2} (x_{2}, y)

L_{v}^{3} (x_{1}, x_{2}, x_{3}, y)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Risk and Portfolio Optimization · Financial Risk and Volatility Modeling

Full text

\addtokomafont

disposition

Evaluating Range Value at Risk Forecasts111An earlier version of this paper was circulated under the name Elicitability of Range Value at Risk.

Tobias Fissler WU Vienna University of Economics and Business, Department of Finance, Accounting and Statistics, Welthandelsplatz 1, 1020 Vienna, Austria, e-mail: [email protected]

Johanna F. Ziegel University of Bern, Department of Mathematics and Statistics, Institute of Mathematical Statistics and Actuarial Science, Alpeneggstrasse 22, 3012 Bern, Switzerland, e-mail: [email protected]

Abstract

Abstract. The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) — a quantile — and Expected Shortfall (ES) — a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term ‘backtesting’ in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable. Mitigating this negative result, this paper shows that a triplet of RVaR with two VaR components at different levels is elicitable. We characterise the class of strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares in robust regression.

Keywords: Backtesting; Consistency; Elicitability; Expected Shortfall; Interquantile expectation; Point forecasts; Robustness; Scoring functions; Trimmed mean; Value at Risk; Winsorized mean

MSC2020 classes: 62C99; 62G35; 62P05; 91G70

1 Introduction

In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure (Artzner et al., 1999) be best in (regulatory) practice. The debate mainly focused on the dichotomy between Value at Risk ( $\operatorname{VaR}_{\beta}$ ) on the one hand and Expected Shortfall ( $\operatorname{ES}_{\beta}$ ) on the other hand, at some probability level $\beta\in(0,1)$ (see Section 2 for definitions). Mirroring the historical joust between median and mean as centrality measures in classical statistics, $\operatorname{VaR}_{\beta}$ , basically a quantile, is esteemed for its robustness, while $\operatorname{ES}_{\beta}$ , a tail expectation, is deemed attractive due to its sensitivity and the fact that it satisfies the axioms of a coherent risk measure (Artzner et al., 1999). We refer the reader to Embrechts et al. (2014) and Emmer et al. (2015) for comprehensive academic discussions, and to Bank for International Settlements (2014) for a regulatory perspective in banking.

Cont et al. (2010) considered the issue of statistical robustness of risk measure estimates in the sense of Hampel (1971). They showed that a risk measure cannot be both robust and coherent. As a compromise, they propose the risk measure ‘Range Value at Risk’, $\operatorname{RVaR}_{\alpha,\beta}$ at probability levels $0<\alpha<\beta<1$ . It is defined as the average of all $\operatorname{VaR}_{\gamma}$ with $\gamma$ between $\alpha$ and $\beta$ (see Section 2 for definitions). As limiting cases, one obtains $\operatorname{RVaR}_{\beta,\beta}=\operatorname{VaR}_{\beta}$ and $\operatorname{RVaR}_{0,\beta}=\operatorname{ES}_{\beta}$ , which presents $\operatorname{RVaR}_{\alpha,\beta}$ as a natural interpolation of $\operatorname{VaR}_{\beta}$ and $\operatorname{ES}_{\beta}$ . Quantifying its robustness in terms of the breakdown point and following the arguments provided in Huber and Ronchetti (2009, p. 59), $\operatorname{RVaR}_{\alpha,\beta}$ has a breakdown point of $\min\{\alpha,1-\beta\}$ , placing it between the very robust $\operatorname{VaR}_{\beta}$ (with a breakdown point of $\min\{\beta,1-\beta\}$ ) and the entirely non-robust $\operatorname{ES}_{\beta}$ (breakdown point 0). This means it is a robust — and hence, not coherent — risk measure, unless it degenerates to $\operatorname{RVaR}_{0,\beta}=\operatorname{ES}_{\beta}$ (or if $0\leq\alpha<\beta=1$ ). Moreover, $\operatorname{RVaR}$ belongs to the wide class of distortion risk measures (Kusuoka, 2001). For further contributions to robustness in the context of risk measures, we refer the reader to Krätschmer et al. (2012, 2014), Kou et al. (2013), Embrechts et al. (2015) and Zähle (2016). Since the influential article Cont et al. (2010), RVaR has gained increasing attention in the risk management literature — see Embrechts et al. (2018a, b) for extensive studies — as well as in econometrics (Barendse, 2020) where RVaR sometimes has the alternative denomination Interquantile Expectation. For the symmetric case $\beta=1-\alpha>1/2$ , $\operatorname{RVaR}_{\alpha,1-\alpha}$ is known under the term $\alpha$ -trimmed mean in classical statistics and it constitutes an alternative to and interpolation of the mean and the median as centrality measures; see Lugosi and Mendelson (2019) for a recent study and a multivariate extension of the trimmed mean. It is closely connected to the $\alpha$ -Winsorized mean, see (2.4).

How to evaluate the predictive performance of point forecasts, $x_{t}$ , for a statistical functional $T$ , such as the mean, median or a risk measure, of the (conditional) distribution of a quantity of interest, $y_{t}$ ? It is commonly measured in terms of the average realised score $\frac{1}{n}\sum_{t=1}^{n}S(x_{t},y_{t})$ for some scoring or loss function $S$ , using the orientation the smaller the better. Consequently, the loss function $S$ should be strictly consistent for $T$ in that $T(F)=\operatorname*{arg\,min}_{x}\int S(x,y)\,\mathrm{d}F(y)$ : Correct predictions are honoured and encouraged in the long run. E.g., the squared loss $S(x,y)=(x-y)^{2}$ is consistent for the mean, and the absolute loss $S(x,y)=|x-y|$ is consistent for the median. If a functional admits a strictly consistent score, it is called elicitable (Osband, 1985; Lambert et al., 2008; Gneiting, 2011). By definition, elicitable functionals allow for $M$ -estimation and have natural estimation paradigms in regression frameworks (Dimitriadis et al., 2020, Section 2), such as quantile regression (Koenker and Basset, 1978; Koenker, 2005) or expectile regression (Newey and Powell, 1987). Elicitability is crucial for meaningful forecast evaluation (Engelberg et al., 2009; Murphy and Daan, 1985; Gneiting, 2011). In the context of probabilistic forecasts with distributional forecasts $F_{t}$ or density forecasts $f_{t}$ , (strictly) consistent scoring functions are often referred to as (strictly) proper rules, such as the log-score $S(f,y)=-\log f(y)$ (Gneiting and Raftery, 2007). In quantitative finance, and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention (Emmer et al., 2015; Ziegel, 2016; Davis, 2016). Especially, the role of elicitability for backtesting purposes has been highly debated (Gneiting, 2011; Acerbi and Székely, 2014, 2017). It has been clarified that elicitability is central for comparative backtesting (Fissler et al., 2016; Nolde and Ziegel, 2017).

Not all functionals are elicitable. Osband (1985) showed that an elicitable functional necessarily has convex level sets (CxLS): If $T(F_{0})=T(F_{1})=t$ for two distributions $F_{0},F_{1}$ , then $T(F_{\lambda})=t$ where $F_{\lambda}=(1-\lambda)F_{0}+\lambda F_{1}$ , $\lambda\in(0,1)$ . Variance and ES generally do not have CxLS (Weber, 2006; Gneiting, 2011), therefore failing to be elicitable. The revelation principle (Osband, 1985; Gneiting, 2011) asserts that any bijection of an elicitable functional is elicitable. This implies that the pair (mean, variance) — being a bijection of the first two moments — is elicitable despite the variance failing to be elicitable. Similarly, Fissler and Ziegel (2016) showed that the pair $(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})$ is elicitable with the structural difference that the revelation principle is not applicable in this instance. This gave rise to the more general finding that the minimal expected score and its minimiser are always jointly elicitable (Brehmer, 2017; Frongillo and Kash, 2020).

Recently, Wang and Wei (2020, Theorem 5.3) showed that $\operatorname{RVaR}_{\alpha,\beta}$ , $0<\alpha<\beta<1$ , similarly to $\operatorname{ES}_{\alpha}$ , fails to have the CxLS property, which rules out its elicitability. In contrast, they observe that the identity

[TABLE]

and the CxLS property of the pair $(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})$ implies the CxLS property of the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ (Wang and Wei, 2020, Example 4.6), leading to the question whether this triplet is elicitable or not. Invoking the elicitability of $(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})$ , the identity at (1.1) and the revelation principle establishes the elicitability of the quadruples $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{ES}_{\alpha},\operatorname{RVaR}_{\alpha,\beta})$ and $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ . This approach has already been used in the context of regression in Barendse (2020).

A fortiori, we show that the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ is elicitable (Theorem 3.2) under weak regularity conditions. Practically, opens the way to meaningful forecast performance comparison, and in particular comparative backtests, of this triplet, as well as to a regression framework. Theoretically, this shows that the elicitation complexity (Lambert et al., 2008; Frongillo and Kash, 2020) or elicitation order (Fissler and Ziegel, 2016) of $\operatorname{RVaR}_{\alpha,\beta}$ ist at most 3. Moreover, requiring only VaR-forecasts besides the RVaR-forecast is particularly advantageous to additionally requiring an ES-forecasts since the triplet $(\operatorname{VaR}_{\alpha}(F),\operatorname{VaR}_{\beta}(F),\operatorname{RVaR}_{\alpha,\beta}(F))$ , $0<\alpha<\beta<1$ , exists and is finite for any distribution $F$ , whereas $\operatorname{ES}_{\alpha}(F)$ and $\operatorname{ES}_{\beta}(F)$ only exist if the (left) tail of the distribution $F$ is integrable. As $\operatorname{RVaR}_{\alpha,\beta}$ is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, this advantage is important.

We would like to point out the structural difference between the elicitability result of $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ provided in this paper and the one concerning $(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})$ in Fissler and Ziegel (2016) as well as the more general results of Frongillo and Kash (2020) and Brehmer (2017). While $\operatorname{ES}_{\alpha}$ corresponds to the negative of a minimum of an expected score which is strictly consistent for $\operatorname{VaR}_{\alpha}$ , it turns out that $\operatorname{RVaR}_{\alpha,\beta}$ can be represented as the difference of minima of expected strictly consistent scoring functions for $\operatorname{VaR}_{\alpha}$ and $\operatorname{VaR}_{\beta}$ (Proposition 3.1). As a consequence, the class of strictly consistent scoring functions for the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ turns out to be less flexible than the one for $(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})$ ; see Remark 3.7 for details. In particular, there is essentially no translation invariant or positively homogeneous scoring function which is strictly consistent for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ ; see Section 4.

The paper is organised as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results establishing the elicitability of the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ (Theorems 3.2 and 3.5) and related findings are presented in Section 3. Section 4 shows that there are basically no strictly consistent scoring functions for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of Ehm et al. (2016). This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of $M$ -estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure (Koenker and Basset, 1978; Ruppert and Carroll, 1980; Rousseeuw, 1984).

2 Notation and Definitions

2.1 Definition of Range Value at Risk

There are different sign conventions in the literature on risk measures. In this paper we use the following convention: If a random variable $Y$ models the losses and gains, then positive values of $Y$ represent gains and negative values of $Y$ losses. Moreover, if $\rho$ is a risk measure, we assume that $\rho(Y)\in\mathbb{R}$ corresponds to the maximal amount of money one can withdraw such that the position $Y-\rho(Y)$ is still acceptable. Hence, negative values of $\rho$ correspond to risky positions. In the sequel, let $\mathcal{F}_{0}$ be the class of probability distribution functions on $\mathbb{R}$ . Recall that the $\alpha$ -quantile, $\alpha\in[0,1]$ of $F\in\mathcal{F}_{0}$ is defined as the set $q_{\alpha}(F)=\{x\in\mathbb{R}\,|\,F(x-)\leq\alpha\leq F(x)\}$ , where $F(x-):=\lim_{t\uparrow x}F(t)$ .

Definition 2.1.

Value at Risk of $F\in\mathcal{F}_{0}$ at level $\alpha\in[0,1]$ is defined as $\operatorname{VaR}_{\alpha}(F)=\inf q_{\alpha}(F)$ .

For any $\alpha\in[0,1]$ we introduce the following subclasses of $\mathcal{F}_{0}$ :

[TABLE]

Definition 2.2.

Range Value at Risk of $F\in\mathcal{F}_{0}$ at levels $0\leq\alpha\leq\beta\leq 1$ is defined as

[TABLE]

The definition of RVaR implies that

[TABLE]

For $0<\alpha\leq\beta<1$ and $F\in\mathcal{F}_{0}$ one obtains that (i) $\operatorname{RVaR}_{\alpha,\beta}(F)\in\mathbb{R}$ ; (ii) $\operatorname{RVaR}_{0,\beta}(F)\in\mathbb{R}\cup\{-\infty\}$ and it is finite if and only if $\int_{-\infty}^{0}|y|\,\mathrm{d}F(y)<\infty$ ; and (iii) $\operatorname{RVaR}_{\alpha,1}(F)\in\mathbb{R}\cup\{\infty\}$ and it is finite if and only if $\int_{0}^{\infty}|y|\,\mathrm{d}F(y)<\infty$ . $\operatorname{RVaR}_{0,1}(F)$ exists only if $\int_{-\infty}^{0}|y|\,\mathrm{d}F(y)<\infty$ or $\int_{0}^{\infty}|y|\,\mathrm{d}F(y)<\infty$ . If $F$ has a finite first moment, then $\operatorname{RVaR}_{0,1}(F)=\int y\,\mathrm{d}F(y)$ coincides with the first moment of $F$ . Provided that $\operatorname{RVaR}_{\alpha,\beta}(F)$ exists it holds that

[TABLE]

using the usual conventions $F(-\infty)=0$ , $F(\infty)=1$ and $0\cdot\infty=0\cdot(-\infty)=0$ . If $F\in\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{(\beta)}$ then the correction terms in the second line of (2.3) vanish, yielding $\operatorname{RVaR}_{\alpha,\beta}(F)=\mathbb{E}_{F}[Y\,\mathds{1}\{\operatorname{VaR}_{\alpha}(F)<Y\leq\operatorname{VaR}_{\beta}(F)\}]/(\beta-\alpha)$ , which justifies an alternative name for RVaR, namely Interquantile Expectation.

Definition 2.3.

Expected Shortfall of $F\in\mathcal{F}_{0}$ at level $\alpha\in(0,1)$ is defined as $\operatorname{ES}_{\alpha}(F)=\operatorname{RVaR}_{0,\alpha}(F)\in\mathbb{R}\cup\{-\infty\}.$

Hence, provided that $\operatorname{ES}_{\alpha}(F),\operatorname{ES}_{\beta}(F)$ are finite, one obtains the identity (1.1). If $F$ has a finite left tail ( $\int_{-\infty}^{0}|y|\,\mathrm{d}F(y)<\infty$ ) then one could use the right hand side of (1.1) as a definition of $\operatorname{RVaR}_{\alpha,\beta}(F)$ . However, in line with our discussion in the introduction, $\operatorname{RVaR}_{\alpha,\beta}(F)$ always exists and is finite for $0<\alpha<\beta<1$ even if the right hand side of (1.1) is not defined.

Interestingly, Embrechts et al. (2018b, Theorem 2) establish that $\operatorname{RVaR}$ can be written as an inf-convolution of $\operatorname{VaR}$ and $\operatorname{ES}$ at appropriate levels. This result amounts to a sup-convolution in our sign convention. Also note that our parametrisation of of $\operatorname{RVaR}_{\alpha,\beta}$ differs from theirs.

For $\alpha\in(0,1/2)$ , $\operatorname{RVaR}_{\alpha,1-\alpha}$ corresponds to the $\alpha$ -trimmed mean and has a close connection to the $\alpha$ -Winsorized mean $W_{\alpha}$ (Huber and Ronchetti, 2009, pp. 57–59) via

[TABLE]

2.2 Elicitability and scoring functions

Using the decision-theoretic framework of Fissler and Ziegel (2016) and Gneiting (2011), we introduce the following notation. Let $\mathcal{F}\subseteq\mathcal{F}_{0}$ be some generic subclass, and $\mathsf{A}\subseteq\mathbb{R}^{k}$ be an action domain. Whenever we consider a functional $T\colon\mathcal{F}\to\mathsf{A}$ , we tacitly assume that $T(F)$ is well-defined for all $F\in\mathcal{F}$ and is an element of $\mathsf{A}$ . $T(\mathcal{F})$ corresponds to the image $\{T(F)\in\mathsf{A}\,|\,F\in\mathcal{F}\}$ . For any subset $M\subseteq\mathbb{R}^{k}$ we denote with $\operatorname{int}(M)$ the largest open subset of $M$ . Moreover, $\operatorname{conv}(M)$ denotes the convex hull of the set $M$ .

We say that a function $a\colon\mathbb{R}\to\mathbb{R}$ is $\mathcal{F}$ -integrable if it is measurable and $\int|a(y)|\,\mathrm{d}F(y)<\infty$ for all $F\in\mathcal{F}$ . Similarly, a function $g\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ is called $\mathcal{F}$ -integrable if $g(x,\cdot)\colon\mathbb{R}\to\mathbb{R}$ is $\mathcal{F}$ -integrable for all $x\in\mathsf{A}$ . If $g$ is $\mathcal{F}$ -integrable, we define the map $\bar{g}\colon\mathsf{A}\times\mathcal{F}\to\mathbb{R}$ , $\bar{g}(x,F):=\int g(x,y)\,\mathrm{d}F(y)$ . If $g\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ is sufficiently smooth in its first argument, we denote the $m$ th partial derivative of $g(\cdot,y)$ with $\partial_{m}g(\cdot,y)$ .

Definition 2.4.

A map $S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ is an $\mathcal{F}$ -consistent scoring function for $T\colon\mathcal{F}\to\mathsf{A}$ if it is $\mathcal{F}$ -integrable and if $\bar{S}(T(F),F)\leq\bar{S}(x,F)$ for all $x\in\mathsf{A}$ , $F\in\mathcal{F}$ . It is strictly $\mathcal{F}$ -consistent for $T$ if it is consistent and if $\bar{S}(T(F),F)=\bar{S}(x,F)$ implies that $x=T(F)$ for all $x\in\mathsf{A}$ and for all $F\in\mathcal{F}$ . A functional $T\colon\mathcal{F}\to\mathsf{A}$ is elicitable on $\mathcal{F}$ if it possesses a strictly $\mathcal{F}$ -consistent scoring function.

Definition 2.5.

Two scoring function $S,\widetilde{S}\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ are equivalent if there is some $a\colon\mathbb{R}\to\mathbb{R}$ and some $\lambda>0$ such that $\widetilde{S}(x,y)=\lambda S(x,y)+a(y)$ for all $(x,y)\in\mathsf{A}\times\mathbb{R}$ . They are strongly equivalent if additionally $a\equiv 0$ .

This equivalence relation preserves (strict) consistency: If $S$ is (strictly) $\mathcal{F}$ -consistent for $T$ and if $a$ is $\mathcal{F}$ -integrable, then $\widetilde{S}$ is also (strictly) $\mathcal{F}$ -consistent for $T$ . Closely related to the concept of elicitability is the notion of identifiability.

Definition 2.6.

A map $V\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}^{k}$ is an $\mathcal{F}$ -identification function for $T\colon\mathcal{F}\to\mathsf{A}$ if it is $\mathcal{F}$ -integrable and if $\bar{V}(T(F),F)=0$ for all $F\in\mathcal{F}$ . It is a strict $\mathcal{F}$ -identification function for $T$ if additionally $\bar{V}(x,F)=0$ implies that $x=T(F)$ for all $x\in\mathsf{A}$ and for all $F\in\mathcal{F}$ . it is consistent and if $\bar{S}(T(F),F)=\bar{S}(x,F)$ implies that $x=T(F)$ for all $x\in\mathsf{A}$ and for all $F\in\mathcal{F}$ . A functional $T\colon\mathcal{F}\to\mathsf{A}$ is elicitable if it possesses a strictly $\mathcal{F}$ -consistent scoring function. A functional $T\colon\mathcal{F}\to\mathsf{A}$ is identifiable on $\mathcal{F}$ if it possesses a strict $\mathcal{F}$ -identification function.

In contrast to Gneiting (2011) we consider point-valued functionals only. For a recent comprehensive study on elicitability of set-valued functionals we refer to Fissler et al. (2020). For the sake of completeness, we list some assumptions used in Section 3 which were originally introduced in Fissler and Ziegel (2016) in the Appendix.

3 Elicitability and identifiability results

Wang and Wei (2020, Theorem 5.3) show that for $0<\alpha<\beta<1$ , $\operatorname{RVaR}_{\alpha,\beta}$ (and also the pairs $(\operatorname{VaR}_{\alpha},\operatorname{RVaR}_{\alpha,\beta})$ and $(\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ ) do not have CxLS on $\mathcal{F}_{\text{dis}}$ , the class of distributions with bounded and discrete support. Hence, invoking that CxLS are necessary for elicitability and identifiability, $\operatorname{RVaR}_{\alpha,\beta}$ and the pairs $(\operatorname{VaR}_{\alpha},\operatorname{RVaR}_{\alpha,\beta})$ and $(\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ fail to be elicitable and identifiable on $\mathcal{F}_{\text{dis}}$ . Our novel contribution is that the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , however, is elicitable and identifiable, subject to mild conditions. We use the notation $S_{\alpha}(x,y)=(\mathds{1}\{y\leq x\}-\alpha)x-\mathds{1}\{y\leq x\}y$ , and recall that $S_{\alpha}$ is $\mathcal{F}$ -consistent for $\operatorname{VaR}_{\alpha}$ if $\int_{-\infty}^{0}|y|\,\mathrm{d}F(y)<\infty$ for all $F\in\mathcal{F}$ , and strictly $\mathcal{F}$ -consistent if furthermore $\mathcal{F}\subseteq\mathcal{F}^{\alpha}$ (Gneiting, 2011).

Proposition 3.1.

For $0<\alpha<\beta<1$ the map $V\colon\mathbb{R}^{3}\times\mathbb{R}\to\mathbb{R}^{3}$

[TABLE]

is an $\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{(\beta)}$ -identification function for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , which is strict on $\mathcal{F}^{\alpha}\cap\mathcal{F}^{(\alpha)}\cap\mathcal{F}^{\beta}\cap\mathcal{F}^{(\beta)}$ .

Proof.

The proof is standard, observing that

[TABLE]

which follows from the representation (2.3). ∎

The following theorem establishes a rich class of (strictly) consistent scoring functions $S\colon\mathbb{R}^{3}\times\mathbb{R}\to\mathbb{R}$ for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ . By a priori assuming forecasts to be bounded with values in some cube $[c_{\min},c_{\max}]^{3}$ , $-\infty\leq c_{\min}<c_{\max}\leq\infty$ (with the tacit convention that $[c_{\min},c_{\max}]:=[c_{\min},c_{\max}]\cap\mathbb{R}$ if $c_{\min}=-\infty$ or $c_{\max}=\infty$ ), the class gets even broader.

Theorem 3.2.

For $0<\alpha<\beta<1$ , the map $S\colon[c_{\min},c_{\max}]^{3}\times\mathbb{R}\to\mathbb{R}$

[TABLE]

is an $\mathcal{F}$ -consistent scoring function for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ if

(i)

$\phi\colon[c_{\min},c_{\max}]\to\mathbb{R}$ * is convex with subgradient $\phi^{\prime}$ ,* 2. (ii)

for all $x_{3}\in[c_{\min},c_{\max}]$ the functions

[TABLE]

are increasing, and 3. (iii)

$y\mapsto a(y)-\mathds{1}\{y\leq x_{1}\}g_{1}(y)-\mathds{1}\{y\leq x_{2}\}g_{2}(y)$ * is $\mathcal{F}$ -integrable for all $x_{1},x_{2}\in[c_{\min},c_{\max}]$ .*

If moreover $\phi$ is strictly convex, and the functions at (3.4) and (3.5) are strictly increasing, then $S$ is strictly $\mathcal{F}^{\alpha}\cap\mathcal{F}^{\beta}$ -consistent for $T$ .

Proof.

Let $(x_{1},x_{2},x_{3})\in\mathsf{A}$ , $F\in\mathcal{F}$ and $(t_{1},t_{2},t_{3}):=T(F)$ . Then, since $G_{1,x_{3}}$ is increasing, $[c_{\min},c_{\max}]\times\mathbb{R}\ni(x_{1}^{\prime},y)\mapsto S(x_{1}^{\prime},x_{2},x_{3},y)$ is $\mathcal{F}$ -consistent for $\operatorname{VaR}_{\alpha}$ and it is strictly $\mathcal{F}^{\alpha}$ -consistent if $G_{1,x_{3}}$ is strictly increasing. Similar comments apply to the map $[c_{\min},c_{\max}]\times\mathbb{R}\ni(x_{2}^{\prime},y)\mapsto S(t_{1},x_{2}^{\prime},x_{3},y)$ . Hence,

[TABLE]

with a strict inequality under the conditions for strict consistency and if $(x_{1},x_{2})\neq(t_{1},t_{2})$ . Finally,

[TABLE]

since $\phi$ is convex. If $\phi$ is strictly convex and if $x_{3}\neq t_{3}$ , the inequality in (3.6) is strict. ∎

Remark 3.3.

Provided condition (iii) in Theorem 3.2 holds and if $\phi$ is strictly convex, and $G_{1,x_{3}}$ and $G_{2,x_{3}}$ strictly increasing then $S$ given in (3.3) is still strictly $\mathcal{F}$ -consistent in the $\operatorname{RVaR}$ -component for general $\mathcal{F}\subseteq\mathcal{F}_{0}$ . That is, for $F\in\mathcal{F}$

[TABLE]

Making use of (2.4) and the revelation principle (Osband, 1985; Gneiting, 2011; Fissler, 2017), Theorem 3.2 also provides a rich class of strictly consistent scoring function for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{1-\alpha},W_{\alpha})$ , where $W_{\alpha}$ is the $\alpha$ -Winsorized mean. The following proposition is useful to construct examples; see Section 6.

Proposition 3.4.

Let $S$ be of the form (3.3) with a (strictly) convex and non-constant function $\phi$ , and functions $g_{1}$ , $g_{2}$ such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied. Then the following holds:

(i)

The subgradient $\phi^{\prime}$ of $\phi$ is necessarily bounded and the one-sided derivatives of $g_{1}$ and $g_{2}$ are necessarily bounded from below. 2. (ii)

$S$ * is strongly equivalent to a scoring function $\tilde{S}$ of the form $\eqref{eq:S}$ with a (strictly) convex function $\tilde{\phi}$ such that $\tilde{\phi}^{\prime}$ is bounded with $\beta-\alpha=-\inf_{x\in[c_{\min},c_{\max}]}\tilde{\phi}^{\prime}(x)=\sup_{x\in[c_{\min},c_{\max}]}\tilde{\phi}^{\prime}(x)$ , and strictly increasing functions $\tilde{g}_{1}$ , $\tilde{g}_{2}$ such that their one-sided derivatives are bounded from below by one and such that such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied.*

Proof.

(i)

The proof is similar to the one of Corollary 5.5 in Fissler and Ziegel (2016): Condition (ii) implies that for any $x_{1},x_{1}^{\prime},x_{2},x_{2}^{\prime},x_{3}\in[c_{\min},c_{\max}]$ with $x_{1}<x_{1}^{\prime}$ and $x_{2}<x_{2}^{\prime}$ it holds that

[TABLE]

Therefore, $\phi^{\prime}$ is bounded, and the one-sided derivative of $g_{1}$ is bounded from below by $\sup_{x_{3}}\phi^{\prime}(x_{3})/(\beta-\alpha)$ while the one-sided derivative of $g_{2}$ is bounded from below by $-\inf_{x_{3}}\phi^{\prime}(x_{3})/(\beta-\alpha)$ . 2. (ii)

For any $c\in\mathbb{R}$ , if we replace $\phi$ with $\widehat{\phi}:x\mapsto\phi(x)+cx$ , $g_{1}$ with $\widehat{g}_{1}:x\mapsto g_{1}(x)+cx/(\beta-\alpha)$ , and $g_{2}$ with $\widehat{g}_{2}:x\mapsto g_{2}(x)-cx/(\beta-\alpha)$ in the formula (3.3) for $S$ , then $S$ does not change. Also $\widehat{\phi}$ is (strictly) convex if and only if $\phi$ is (strictly) convex. Furthermore, conditions (ii) and (iii) of Theorem 3.2 hold for $\phi$ , $g_{1}$ , $g_{2}$ if and only if they hold for $\widehat{\phi}$ , $\widehat{g}_{1}$ and $\widehat{g}_{2}$ . By part (i) of the proposition $\phi^{\prime}$ is bounded. Therefore, we can assume without loss of generality that $-\inf_{x\in[c_{\min},c_{\max}]}\phi^{\prime}(x)=\sup_{x\in[c_{\min},c_{\max}}\phi^{\prime}(x)=\lambda>0$ , since $\phi$ is non-constant. Then the argument follows by setting $\tilde{S}=\frac{\lambda}{\beta-\alpha}S$ .

∎

Invoking the inequality (2.2) the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ can only attain values in the domain $\mathsf{A}_{0}:=\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\,|\,x_{1}\leq x_{3}\leq x_{2}\}$ . Therefore, we call $\mathsf{A}_{0}$ the maximal sensible action domain. Issuing forecasts for $T$ outside $\mathsf{A}_{0}$ , thus violating (2.2) would be irrational, corresponding to, say, negative variance forecasts. Still, the scoring functions of the form (3.3) allow for the evaluation of forecasts violating (2.2). Striving for a necessary characterisation result of (strictly) consistent scoring functions for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , it is immediate to realise that there is flexibility in $[c_{\min},c_{\max}]^{3}\setminus\mathsf{A}_{0}$ since one could possibly set the score to infinity there and would still preserve (strict) consistency. Therefore, it is not astonishing that a necessary characterisation result works only on domains $\mathsf{A}\subseteq\mathsf{A}_{0}$ . The key to such a necessary characterisation is Osband’s principle (Fissler and Ziegel, 2016, Theorem 3.2) originating from the seminal dissertation of Osband (1985). Since it exploits a first-order condition of the minimisation of the expected score, the main assumptions of the result consist of smoothness assumptions on expected score as well as richness assumptions on the underlying class of distributions $\mathcal{F}$ ; see Appendix for the detailed technical formulations and Fissler and Ziegel (2016) for a discussion of these conditions.

We introduce the class $\mathcal{F}_{\mathrm{cont}}\subset\mathcal{F}_{0}$ of distributions which are continuously differentiable and with a strictly positive derivative / density. (Clearly $\mathcal{F}_{\mathrm{cont}}\subset\mathcal{F}^{\gamma}\cap\mathcal{F}^{(\gamma)}$ for any $\gamma\in(0,1)$ .) For any $\mathsf{A}\subseteq\mathbb{R}^{3}$ , we denote the projections on the $r$ th component by $\mathsf{A}^{\prime}_{r}:=\{x_{r}\in\mathbb{R}\,|\,\exists(z_{1},z_{2},z_{3})\in\mathsf{A},\ z_{r}=x_{r}\}$ , $r\in\{1,2,3\}$ . For any $x_{3}\in\mathsf{A}^{\prime}_{3}$ and $m\in\{1,2\}$ , let $\mathsf{A}^{\prime}_{m,x_{3}}:=\{x_{m}\in\mathbb{R}\,|\,\exists(z_{1},z_{2},z_{3})\in\mathsf{A},\,z_{m}=x_{m},\,z_{3}=x_{3}\}$ .

Theorem 3.5.

Let $\mathcal{F}\subseteq\mathcal{F}_{\mathrm{cont}}$ , $0<\alpha<\beta<1$ , $T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})\colon\mathcal{F}\to\mathsf{A}\subseteq\mathsf{A}_{0}$ , and let $V=(V_{1},V_{2},V_{3})^{\intercal}$ defined at (3.1). If Assumptions (V1), and (F1) hold and $(V_{1},V_{2})^{\intercal}$ satisfies Assumption (V4), then any strictly $\mathcal{F}$ -consistent scoring function $S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ for $T$ that satisfies assumptions (VS1) and (S2) is necessarily of the form (3.3) almost everywhere, where the functions $G_{r,x_{3}}\colon\mathsf{A}^{\prime}_{r,x_{3}}\to\mathbb{R}$ , $r\in\{1,2\}$ , $x_{3}\in\mathsf{A}^{\prime}_{3}$ , in (3.4) and (3.5) are strictly increasing and $\phi\colon\mathsf{A}^{\prime}_{3}\to\mathbb{R}$ is strictly convex.

Proof.

First note that $V$ satisfies assumption (V3) on $\mathcal{F}\subseteq\mathcal{F}_{\mathrm{cont}}$ . Let $F\in\mathcal{F}$ with derivative $f$ and let $x\in\operatorname{int}(\mathsf{A})$ . Then one obtains

[TABLE]

The partial derivatives of $V$ are given by $\partial_{1}\bar{V}_{1}(x,F)=f(x_{1})$ , $\partial_{2}\bar{V}_{2}(x,F)=f(x_{2})$ , $\partial_{1}\bar{V}_{3}(x,F)=-(F(x_{1})-\alpha)/(\beta-\alpha)$ , $\partial_{2}\bar{V}_{3}(x,F)=(F(x_{2})-\beta)/(\beta-\alpha)$ , $\partial_{3}\bar{V}_{3}(x,F)=1$ , and $\partial_{r}\bar{V}_{1}(x,F)$ and $\partial_{m}\bar{V}_{2}(x,F)$ vanish for $r\in\{2,3\}$ and $m\in\{1,3\}$ . Applying Fissler and Ziegel (2016, Theorem 3.2) yields the existence of continuously differentiable functions $h_{lm}\colon\operatorname{int}(\mathsf{A})\to\mathbb{R}$ , $l,m\in\{1,2,3\}$ , such that $\partial_{m}\bar{S}(x,F)=\sum_{i=1}^{3}h_{mi}(x)\bar{V}_{i}(x,F)$ for $m\in\{1,2,3\}$ . Since we assume that $\bar{S}(\cdot,F)$ is twice continuously differentiable for any $F\in\mathcal{F}$ , the second order partial derivatives need to commute. Let $t=T(F)$ . Then $\partial_{1}\partial_{2}\bar{S}(t,F)=\partial_{2}\partial_{1}\bar{S}(t,F)$ is equivalent to $h_{21}(t)f(t_{1})=h_{12}(t)f(t_{2}).$ This needs to hold for all $F\in\mathcal{F}$ . The variation in the densities implied by Assumption (V4) in combination with the surjectivity of $T$ yield that $h_{12}\equiv h_{21}\equiv 0$ on $\operatorname{int}(\mathsf{A})$ . Similarly, evaluating $\partial_{1}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{1}\bar{S}(x,F)$ and $\partial_{2}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{2}\bar{S}(x,F)$ at $x=t=T(F)$ yields $h_{13}(t)=h_{31}(t)f(t_{1}),h_{23}(t)=h_{32}(t)f(t_{2}).$ Using again Assumption (V4) as well as the surjectivity of $T$ , this implies that $h_{13}\equiv h_{31}\equiv h_{23}\equiv h_{32}\equiv 0.$ So we are left with characterising $h_{mm}$ for $m\in\{1,2,3\}$ . Note that Assumption (V1) implies that for any $x=(x_{1},x_{2},x_{3})\in\operatorname{int}(\mathsf{A})$ there are two distributions $F_{1},F_{2}\in\mathcal{F}$ such that $(F_{1}(x_{1})-\alpha,F_{1}(x_{2})-\beta)^{\intercal}$ and $(F_{2}(x_{1})-\alpha,F_{2}(x_{2})-\beta)^{\intercal}$ are linearly independent. Then, the requirement that

[TABLE]

for all $x\in\operatorname{int}(\mathsf{A})$ and for all $F\in\mathcal{F}$ implies that $\partial_{1}h_{22}\equiv\partial_{2}h_{11}\equiv 0$ . Starting with $\partial_{1}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{1}\bar{S}(x,F)$ , implies that $\partial_{1}h_{33}\bar{V}_{3}(x,F)=\big{(}\partial_{3}h_{11}(x)+h_{33}(x)/(\beta-\alpha)\big{)}\bar{V}_{1}(x,F).$ Again, Assumption (V1) implies that there are $F_{1},F_{2}\in\mathcal{F}$ such that $\big{(}\bar{V}_{1}(x,F_{1}),\bar{V}_{3}(x,F_{1})\big{)}^{\intercal}$ and $\big{(}\bar{V}_{1}(x,F_{2}),\bar{V}_{3}(x,F_{2})\big{)}^{\intercal}$ are linearly independent. Hence, we obtain that $\partial_{1}h_{33}\equiv 0$ and $\partial_{3}h_{11}\equiv-h_{33}/(\beta-\alpha)$ . With the same argumentation and starting from $\partial_{2}\partial_{3}\bar{S}(x,F)=\partial_{3}\partial_{2}\bar{S}(x,F)$ one can show that $\partial_{2}h_{33}\equiv 0$ and $\partial_{3}h_{22}\equiv h_{33}/(\beta-\alpha)$ . This means there exist functions $c_{1}\colon\{(x_{1},x_{3})\in\mathbb{R}^{2}\,|\,\exists(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\ x_{1}=z_{1},x_{3}=z_{3}\}\to\mathbb{R}$ , $c_{2}\colon\{(x_{2},x_{3})\in\mathbb{R}^{2}\,|\,\exists(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\ x_{2}=z_{2},x_{3}=z_{3}\}\to\mathbb{R}$ , $c_{3}\colon\operatorname{int}(\mathsf{A})^{\prime}_{3}\to\mathbb{R}$ , and some $z\in\operatorname{int}(\mathsf{A})^{\prime}_{3}$ such that for any $x=(x_{1},x_{2},x_{3})\in\operatorname{int}(\mathsf{A})$ it holds that $h_{33}(x)=c_{3}(x_{3})$ ,

[TABLE]

where $b_{r}\colon\operatorname{int}(\mathsf{A})^{\prime}_{r}\to\mathbb{R}$ , $r\in\{1,2\}$ . Due to the fact that any component of $T$ is mixture-continuous222For convex $\mathcal{F}$ a functional $T\colon\mathcal{F}\to\mathbb{R}^{k}$ is called mixture-continuous if for any $F,G\in\mathcal{F}$ the map $[0,1]\ni\lambda\mapsto T((1-\lambda)F+\lambda G)$ is continuous. and since $\mathcal{F}$ is convex and $T$ surjective, the projection $\operatorname{int}(\mathsf{A})^{\prime}_{3}$ is an open interval. Hence, $[\min(z,x_{3}),\max(z,x_{3})]\subset\operatorname{int}(\mathsf{A})^{\prime}_{3}$ . Due to Assumptions (V3) and (S2), Fissler and Ziegel (2016, Theorem 3.2) implies that $c_{1},c_{2},c_{3}$ are locally Lipschitz continuous.

The above calculations imply that the Hessian of the expected score, $\nabla^{2}\bar{S}(x,F)$ , at its minimiser $x=t=T(F)$ , is a diagonal matrix with entries $c_{1}(t_{1},t_{3})f(t_{1})$ , $c_{2}(t_{2},t_{3})f(t_{2})$ , and $c_{3}(t_{3})$ . As a second order condition $\nabla^{2}\bar{S}(t,F)$ must be positive semi-definite. Invoking the surjectivity of $T$ once again, this shows that $c_{1},c_{2},c_{3}\geq 0$ . More to the point, invoking the continuous differentiability of the expected score and the fact that $S$ is strictly $\mathcal{F}$ -consistent for $T$ one obtains that for any $F\in\mathcal{F}$ with $t=T(F)$ and for any $v\in\mathbb{R}^{3}$ , $v\neq 0$ , there exists an $\varepsilon>0$ such that $\frac{\mathrm{d}}{\mathrm{d}s}\bar{S}(t+sv,F)$ is negative for all $s\in(-\varepsilon,0)$ , zero for $s=0$ and positive for all $s\in(\varepsilon,0)$ For $v=e_{3}=(0,0,1)^{\intercal}$ , this means that for any $F\in\mathcal{F}$ with $t=T(F)$ there is an $\varepsilon>0$ such that $\frac{\mathrm{d}}{\mathrm{d}s}\bar{S}(t+se_{3},F)=c_{3}(t_{3}+s)s$ has the same sign as $s$ for all $s\in(-\varepsilon,\varepsilon)$ . Therefore, $c_{3}(t_{3}+s)>0$ for all $s\in(-\varepsilon,\varepsilon)\setminus\{0\}$ . Using the surjectivity of $T$ and invoking a compactness argument, $c_{3}$ attains a 0 only finitely many times on any compact interval. Recall that $\operatorname{int}(\mathsf{A})^{\prime}_{3}$ is an open interval. Hence, it can be approximated by an increasing sequence of compact intervals. Therefore, $c_{3}^{-1}(\{0\})$ is at most countable and therefore a Lebesgue null set. With similar arguments one can show that for any $x_{3}\in\operatorname{int}(\mathsf{A})^{\prime}_{3}$ , the sets $\{x_{1}\in\mathbb{R}\,|\,\exists(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\ x_{1}=z_{1},\ x_{3}=z_{3},\ c_{1}(x_{1},x_{3})=0\}$ and $\{x_{2}\in[x_{3},\infty)\,|\,\exists(z_{1},z_{2},z_{3})\in\operatorname{int}(\mathsf{A}),\ x_{2}=z_{2},\ x_{3}=z_{3},\ c_{2}(x_{2},x_{3})=0\}$ are at most countable and therefore also Lebesgue null sets.

Finally, using Proposition 1 in Fissler and Ziegel (2020) (recognising that $V$ is locally bounded) one obtains that $S$ is almost everywhere of the form (3.3). Moreover, it holds almost everywhere that $\phi^{\prime\prime}=c_{3}$ and $g_{m}^{\prime}=b_{m}$ for $m\in\{1,2\}$ . Hence, $\phi$ is strictly convex and the functions at (3.4) and (3.5) are strictly increasing. ∎

Combining Theorems 3.2 and 3.5, one can show that the scoring functions given at (3.3) are essentially the only strictly consistent scoring functions for the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ on the action domain $\mathsf{A}=\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\,|\,c_{\min}\leq x_{1}\leq x_{3}\leq x_{2}\leq c_{\max}\}$ .

Corollary 3.6.

Let $\mathsf{A}=\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\,|\,c_{\min}\leq x_{1}\leq x_{3}\leq x_{2}\leq c_{\max}\}$ for some $-\infty\leq c_{\min}<c_{\max}\leq\infty$ . Under the conditions of Theorem 3.5, a scoring function $S\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ is strictly $\mathcal{F}$ -consistent for $T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , $0<\alpha<\beta<1$ , if and only if it is of the form (3.3) almost everywhere satisfying conditions (i), (ii), (iii). Moreover, the function $\phi^{\prime}\colon[c_{\min},c_{\max}]\to\mathbb{R}$ is necessarily bounded.

Proof.

For the proof it suffices to show that for $r\in\{1,2\}$ , $G_{r,x_{3}}$ defined in (3.4), (3.5) is not only increasing on $\mathsf{A}_{r,x_{3}}^{\prime}$ for any $x_{3}\in\mathsf{A}_{3}^{\prime}$ but on $\mathsf{A}_{r}^{\prime}=[c_{\min},c_{\max}]$ . For $x_{3}\in[c_{\min},c_{\max}]=\mathsf{A}_{3}^{\prime}$ , we have $\mathsf{A}_{1,x_{3}}^{\prime}=[c_{\min},x_{3}]$ and $\mathsf{A}_{2,x_{3}}^{\prime}=[x_{3},c_{\max}]$ . Let $x_{3}\in\mathsf{A}^{\prime}_{3}$ and $x_{1},x_{1}^{\prime}\in\mathsf{A}^{\prime}_{1}$ with $x_{1}<x_{1}^{\prime}$ . If $x_{1},x_{1}^{\prime}\in\mathsf{A}^{\prime}_{1,x_{3}}$ there is nothing to show. If however $x_{3}<x_{1}^{\prime}$ , then $x_{1},x^{\prime}_{1}\in\mathsf{A}^{\prime}_{1,x^{\prime}_{1}}$ . This means that

[TABLE]

where the second inequality stems from the fact that $\phi^{\prime}$ is increasing. If the function $G_{1,x^{\prime}_{1}}$ is strictly increasing, then the first inequality is strict. The argument for $G_{2,x_{3}}$ works analogously. ∎

Remark 3.7.

Note the structural difference of Theorems 3.2 and 3.5 to Frongillo and Kash (2020, Theorem 1), Brehmer (2017, Proposition 4.14) and in particular Fissler and Ziegel (2016, Theorem 5.2 and Corollary 5.5). Our functional of interest, $\operatorname{RVaR}_{\alpha,\beta}$ with $0<\alpha<\beta<1$ , is not a minimum of an expected scoring function — or Bayes risk —, but a difference of minima of two scoring functions. Indeed, while $\operatorname{ES}_{\beta}(F)=-\frac{1}{\beta}\bar{S}_{\beta}(\operatorname{VaR}_{\beta}(F),F)$ , we have that

[TABLE]

This structural difference is reflected in the minus sign appearing at (3.4). In particular, it means that the functions $g_{1}$ and $g_{2}$ cannot identically vanish if we want to ensure strict consistency of $S$ , whereas the corresponding functions in Theorem 5.2 in Fissler and Ziegel (2016) may well be set to zero. Frongillo and Kash (2020, Theorem 2) generalises our results and presents an elicitability result of any linear combination of Bayes risks.

Concrete examples for choices of the functions $g_{1}$ , $g_{2}$ , and $\phi$ for the scoring function $S$ at (3.3) are given and discussed in Section 6.

4 Translation invariance and homogeneity

There are many choices for the functions $g_{1}$ , $g_{2}$ , and $\phi$ appearing in the formula for the scoring function $S$ at (3.3). Often, these choices can be limited by imposing secondary desirable criteria on $S$ . In this section we show that, unfortunately, standard criteria (Patton (2011); Nolde and Ziegel (2017); Fissler and Ziegel (2019)) such as translation invariance and positive homogeneity are not fruitful for RVaR.

If one is interested in scoring functions with an action domain of the form $\mathsf{A}=\{x\in\mathbb{R}^{3}\,|\,c_{\min}\leq x_{1}\leq x_{3}\leq x_{2}\leq c_{\max}\}$ possessing the additional property of translation invariant score differences, the only sensible choice is $c_{\min}=-\infty$ , $c_{\max}=\infty$ , amounting to the maximal action domain $\mathsf{A}_{0}$ . Similarly, for scoring functions with positively homogeneous score differences, the most interesting choices for action domains are $\mathsf{A}=\mathsf{A}_{0}$ , $\mathsf{A}=\mathsf{A}_{0}^{+}=\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\,|\,0\leq x_{1}\leq x_{3}\leq x_{2}\}$ or $\mathsf{A}=\mathsf{A}_{0}^{-}=\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\,|\,x_{1}\leq x_{3}\leq x_{2}\leq 0\}$ .

Proposition 4.1 (Translation invariance).

Under the conditions of Theorem 3.5 there are no strictly $\mathcal{F}$ -consistent scoring functions for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ on $\mathsf{A}_{0}$ with translation invariant score differences.

Proof.

Using Theorem 3.5 any strictly $\mathcal{F}$ -consistent scoring function for $T$ must be of the form (3.3) where in particular $\phi$ is strictly convex, twice differentiable, and $\phi^{\prime}$ is bounded. Assume that $S$ has translation invariant score differences. That means that the function $\Psi\colon\mathbb{R}\times\mathsf{A}_{0}\times\mathsf{A}_{0}\times\mathbb{R}\to\mathbb{R}$ ,

[TABLE]

vanishes. Then, for all $x\in\mathsf{A}_{0}$ and for all $z,y\in\mathbb{R}$

[TABLE]

Therefore, $\phi^{\prime\prime}$ needs to be constant. Since $\phi$ is convex and that means that $\phi^{\prime}(x_{3})=dx_{3}+d^{\prime}$ with $d>0$ . But since $\mathsf{A}^{\prime}_{3}=\mathbb{R}$ , $\phi^{\prime}$ is unbounded, which is a contradiction. ∎

The proof of Proposition 4.1 closely follows the one of Proposition 4.10 in Fissler and Ziegel (2019). The fact that the latter assertion entails a positive result has the following background: The strictly consistent scoring function for $(\operatorname{VaR}_{\alpha},\operatorname{ES}_{\alpha})$ given in Fissler and Ziegel (2019, Proposition 4.10) works only on a very restricted action domain. To guarantee strict consistency on such an action domain, one would need a refinement of Theorem 3.2 in the spirit of Fissler and Ziegel (2020, Proposition 2). However, since such a positive result on a quite restricted action domain is practically irrelevant, we dispense with such a refinement and only state the relevant negative result here.

Proposition 4.2 (Homogeneity).

Under the conditions of Theorem 3.5 there are no strictly $\mathcal{F}$ -consistent scoring functions for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ on $\mathsf{A}\in\{\mathsf{A}_{0},\mathsf{A}_{0}^{+},\mathsf{A}_{0}^{-}\}$ with positively homogeneous score differences.

Proof.

Using Theorem 3.5 any strictly $\mathcal{F}$ -consistent scoring function for $T$ must be of the form (3.3) where in particular $\phi$ is strictly convex, twice differentiable, and $\phi^{\prime}$ is bounded. Assume that $S$ has positively homogeneous score differences of some degree $b\in\mathbb{R}$ . That means that the function $\Psi\colon(0,\infty)\times\mathsf{A}\times\mathsf{A}\times\mathbb{R}\to\mathbb{R}$ ,

[TABLE]

vanishes. Therefore, for all $x\in\mathsf{A}$ , for all $y\in\mathbb{R}$ and all $c>0$

[TABLE]

For the sake of brevity, we only consider the case $\mathsf{A}=\mathsf{A}_{0}^{-}$ , the other cases being similar. Equation (4.1) implies that that $\phi^{\prime\prime}(-x_{3})=\phi^{\prime\prime}(-1)x_{3}^{b-2}$ for any $x_{3}>0$ . Due to the strict convexity of $\phi$ , we need that $\phi^{\prime\prime}(-1)>0$ . However, for $b\geq 1$ , $\inf_{x_{3}>0}\phi^{\prime}(-x_{3})=-\infty$ and for $b\leq 1$ , $\sup_{x_{3}>0}\phi^{\prime}(-x_{3})=\infty$ . Hence, $\phi^{\prime}$ cannot be bounded. ∎

Remark 4.3.

The negative result of Proposition 4.2 should be compared with the results of Theorem C.3 in Nolde and Ziegel (2017) characterising homogeneous strictly consistent scoring functions for the pair $(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})$ . Since they use a different sign convention for $\operatorname{VaR}$ and $\operatorname{ES}$ than we do in this paper, their choice of the action domain $\mathbb{R}\times(0,\infty)$ corresponds to our choice $\mathsf{A}_{0}^{-}$ . When interpreting $\operatorname{RVaR}_{\alpha,\beta}$ as a risk measure, negative values of $\operatorname{RVaR}$ are the more interesting and relevant ones, using our sign convention. Inspecting the proof of Proposition 4.2 and of Proposition 3.4(i) one makes the following observation: For $b\geq 1$ , Nolde and Ziegel (2017) state an impossibility result for their choice of action domain. In fact, the problem occurring in our context is that $\phi^{\prime}$ is not bounded from below. In Proposition 3.4 this property is implied by the fact that the function $G_{2,x_{3}}$ at (3.5) is increasing. And it is exactly such a condition that is also present for strictly consistent scoring functions for the pair $(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})$ ; see Theorem 5.2 in Fissler and Ziegel (2016). On the other hand, the complication for $b<1$ stems from the fact that $\phi^{\prime}$ is not bounded from above. This condition is related to the monotonicity of $G_{1,x_{3}}$ at (3.4). Such a condition is not present for strictly consistent scoring functions for the pair $(\operatorname{VaR}_{\beta},\operatorname{ES}_{\beta})$ . Correspondingly, there can be homogeneous and strictly consistent scoring functions for $b<1$ for this pair (Nolde and Ziegel, 2017) while this is not possible for the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ .

5 Mixture representation of scoring functions

When forecasts are compared and ranked with respect to consistent scoring functions, one has to be aware that in the presence of non-nested information sets, model mis-specification and/or finite samples, the ranking may depend on the chosen consistent scoring function (Patton, 2020). In the specific case of $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , the forecast ranking may depend on the specific choice for the functions $g_{1}$ , $g_{2}$ , and $\phi$ appearing in Theorem 3.2. A possible remedy to this problem is to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams as introduced by Ehm et al. (2016). Murphy diagrams are based on the fact that the class of all consistent scoring functions can be characterised as a class of mixtures of elementary scoring functions that depend on a low-dimensional parameter. The following theorem provides such a mixture representation for the scoring functions at (3.3). The applicability is illustrated in Section 6. Recall that $S_{\alpha}(x,y)=(\mathds{1}\{y\leq x\}-\alpha)x-\mathds{1}\{y\leq x\}y$ .

Theorem 5.1.

Let $0<\alpha<\beta<1$ . Any scoring function $S:[c_{\min},c_{\max}]^{3}\times\mathbb{R}\to\mathbb{R}$ of the form (3.3) with $a\colon\mathbb{R}\to\mathbb{R}$ chosen such that $S(y,y,y,y)=0$ can be written as

[TABLE]

where

[TABLE]

and $H_{1}$ , $H_{2}$ are locally finite measures on $[c_{\min},c_{\max}]$ and $H_{3}$ is a finite measure on $[c_{\min},c_{\max}]$ . If $H_{3}$ puts positive mass on all open intervals, then $S$ is strictly consistent. Conversely, for any choice of measures $H_{1},H_{2},H_{3}$ with the above restrictions, we obtain a scoring function of the form (3.3).

Proof.

An increasing function $h:[c_{\min},c_{\max}]\to\mathbb{R}$ can always be written as

[TABLE]

for some locally finite measure $H$ , and some $z\in[c_{\min},c_{\max}]$ , $C\in\mathbb{R}$ . The function $h$ is strictly increasing if and only if $H$ is strictly positive, i.e., it puts positive mass on all open non-empty intervals. Furthermore, the one-sided derivatives of $h$ are bounded below by $\lambda>0$ if and only if $H(A)\geq\lambda\mathcal{L}(A)$ for all Borel sets $A\subseteq[c_{\min},c_{\max}]$ , where $\mathcal{L}$ is the Lebesgue measure on $\mathbb{R}$ .

Using the arguments from Proposition 3.4, it is no loss of generality to show the assertion for a score $S$ such that $\lambda(\beta-\alpha)=-\inf_{x}\phi^{\prime}(x)=\sup_{x}\phi^{\prime}(x)$ and the one-sided derivatives of $g_{1}$ , $g_{2}$ are bounded from below by $\lambda>0$ .

Then, there is a measure $H_{3}$ on $[c_{\min},c_{\max}]$ such that $H_{3}([c_{\min},c_{\max}])=2\lambda(\beta-\alpha)$ , which is strictly positive if and only if $\phi$ is strictly convex, such that for all for all $x_{3}\in[c_{\min},c_{\max}]$ , we have

[TABLE]

Using Fubini’s theorem, we find that

[TABLE]

Using (3.3), (5.2) and Proposition 3.4 it is straight forward to check that a scoring function of the form (3.3) can be written as in (5.1) with $L_{v}^{3}$ replaced by

[TABLE]

and locally finite measures $\tilde{H}_{1}$ , $\tilde{H}_{2}$ on $[c_{\min},c_{\max}]$ instead of $H_{1}$ , $H_{2}$ such that $\tilde{H}_{i}(A)\geq\lambda\mathcal{L}(A)$ for $i=1,2$ , and for all Borel sets $A\subseteq\mathbb{R}$ , and the measure measure $H_{3}$ . We can write $\tilde{H}_{i}=H_{i}+\lambda\mathcal{L}$ , $i=1,2$ , for some locally finite measures $H_{i}$ , $i=1,2$ . Integrating $v\mapsto L_{v}^{1}$ with respect to $\lambda\mathcal{L}$ , we obtain the function $\lambda(S_{\alpha}(x_{1},y)+\alpha y)$ , and analogously for $L_{v}^{2}$ . Using that $H_{3}([c_{\min},c_{\max}])=2\lambda(\beta-\alpha)$ yields the claim with

[TABLE]

which is equal to the formula given in the statement of the theorem. The scoring functions $L_{v}^{1}$ and $L_{v}^{2}$ are consistent for VaR at level $\alpha$ and $\beta$ , respectively. The scoring function $L_{v}^{3}$ is of the form (3.3) with $g_{1}(x)=g_{2}(x)=x/(2\beta-2\alpha)$ and $\phi(x)=|x-v|/2$ , which renders it a consistent scoring function for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ . The converse statement follows by direct computations. ∎

6 Simulations

This simulation study illustrates the usage of consistent scoring functions for the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ when comparing the predictive performances of different forecasts for this triplet, e.g., in the context of comparative backtests (Nolde and Ziegel, 2017). Due to the negative results in Section 4 it is challenging to suggest concrete examples for the choices of the functions $\phi$ , $g_{1}$ and $g_{2}$ in (3.3). In Table 1, we give some first suggestions. The scoring function $S_{4}$ is in the spirit of the Huber loss (Huber, 1964, p. 79). It is only strictly consistent on $[c_{1},c_{2}]^{3}$ , but remains consistent for all of $\mathbb{R}^{3}$ . We illustrate the discrimination ability of the suggested scoring functions with a slightly extended version of a simulation example of Gneiting et al. (2007) which has also been considered in Fissler et al. (2016).

We consider a data generating process $(Y_{t})_{t=1,\dots,N}$ given by $Y_{t}=\mu_{t}+u_{t}$ , where $(\mu_{t})_{t=1,\dots,N}$ and $(u_{t})_{t=1,\dots,N}$ are mutually independent sequences of i.i.d. standard normal random variables. Suppose we have three different forecasters who provide point forecasts, aiming at correctly specifying $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ of the (conditional) distribution of $Y_{t}$ . The first forecaster has access to $\mu_{t}$ and uses the correct conditional distribution for prediction, that is, they predict

[TABLE]

for timepoint $t$ , where $\varphi$ and $\Phi$ denote the density and quantile function of the standard normal distribution, respectively. The second forecaster predicts $g_{t}=(g_{1,t},g_{2,t},g_{3,t})$ , where $g_{1,t}=f_{1,t}+\varepsilon_{t}$ , $g_{2,t}=f_{2,t}+\varepsilon_{t}$ and $g_{3,t}=f_{3,t}+\varepsilon_{t}$ and where $(\varepsilon_{t})_{t=1,\dots,N}$ is independent normally distributed noise with mean zero and variance $\sigma^{2}$ . The third forecaster, $h_{t}=(h_{1,t},h_{2,t},h_{3,t})$ , bases their predictions on the unconditional distribution of $Y_{t}$ , that is $\mathcal{N}(0,2)$ . Therefore, the forecasts take the form

[TABLE]

It is clear that the first forecaster dominates the second and the third forecaster, that is, they will be preferred under any consistent scoring function. Indeed, invoking Holzmann and Eulert (2014), in case of the first and the second forecaster, the first one is ideal with respect to the information set $\sigma(\mu_{t},\varepsilon_{t})$ , whereas the second one is based on the same information set but is not ideal. In case of the first and the third forecaster, both forecasters are ideal but the information set of the first forecaster, $\sigma(\mu_{t})$ , is larger than the one of the third forecaster, which is the trivial $\sigma$ -algebra. It will depend on the size of the variance $\sigma^{2}$ whether the second or the third forecaster is preferred. Figures 1 and 2 provide Murphy diagrams of all forecasters computed from a sample of size $N=100^{\prime}000$ , providing a good approximation of the population level. They are in line with our theoretical considerations above concerning the ranking of the three forecasts.

We compare the predictive performances using Diebold-Mariano tests (Diebold and Mariano, 1995) based on the scoring functions in Table 1. We consider samples of size $N=250$ and repeat our experiment 10’000 times. In the left panel of Table 2, we consider the case that $\alpha=1-\beta=0.1$ where $\operatorname{RVaR}_{\alpha,\beta}$ is a trimmed mean. We report the ratio of rejections of the null hypothesis that forecaster $i$ outperforms forecaster $j$ , $i,j\in\{1,2,3\}$ , $i\neq j$ , evaluated in terms of the score $S$ at significance level $0.05$ . E.g., for $i=1,j=2$ , we consider the null hypothesis $\mathbb{E}[S(f_{t},Y_{t})]\leq\mathbb{E}[S(g_{t},Y_{t})]$ for all $t=1,\ldots,N$ , or in short, $f\preceq g$ . Analogously, in the right panel of Table 2, we consider the case that $\alpha,\beta$ are both close to zero, that is, $\alpha=0.01$ and $\beta=0.05$ , which is a setting that is relevant if $\operatorname{RVaR}_{\alpha,\beta}$ is used as a risk measure. For the scoring function $S_{4}$ , we have experimented a bit with the values $c_{1}$ and $c_{2}$ and report the results for the choices that worked best in our experiments. A systematic study on how to choose these two parameters goes beyond the scope of the present paper.

For the situation of the left panel of Table 2 concerning $\alpha=1-\beta=0.1$ , we can see that forecaster 1 (2) outperforms forecaster 3 with a power of 1 (almost 1) for all scoring functions used. For a comparison of forecaster 1 and forecaster 2, the situation is more interesting: While forecaster 1 outperforms forecaster 2 with regard to all scoring functions considered, the power of the tests (and the associated discrimination ability of the scoring functions) varies substantially. While $S_{1}$ leads to an empirical power of 0.304 for the null hypothesis $f\preceq g$ , the score $S_{4}$ induces a power of 0.624 for the same null hypothesis. The situation described in the right panel of Table 2 considering the parameter choice $\alpha=0.01$ and $\beta=0.05$ leads to a different situation. The tests employing $S_{1}$ , $S_{2}$ and $S_{3}$ have a similar power. In contrast, $S_{4}$ yields a considerably smaller power (0.393) for the null $h\preceq g$ than the other scores ( $\geq 0.874$ for all cases). A more detailed study and comparison of other scoring functions and other situations is deferred to future work.

7 Implications for regression

After illustrating the usage of consistent scoring functions in forecast comparison and comparative backtesting in Section 6, we would like to outline how one can implement our results about the elicitability of the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , $0<\alpha<\beta<1$ in a regression context. Then we would like to contrast our ansatz to other suggestions for regression of the $\alpha$ -trimmed mean (which can be generalised to $\operatorname{RVaR}_{\alpha,\beta}$ ). The most common alternative approaches in the literature on robust statistics are the trimmed least squares approach and a two-step estimation procedure using the Huber skipped mean.

7.1 A joint regression framework for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$

Let $(X_{t},Y_{t})_{t\in\mathbb{N}}$ be a time series with the usual notation that $Y_{t}$ denotes some real valued response variable and $X_{t}$ is a $d$ -dimensional vector of regressors. Let $\Theta\subseteq\mathbb{R}^{k}$ be some parameter space and $M\colon\mathbb{R}^{d}\times\Theta\to\mathbb{R}^{3}$ a parametric model for $T=(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ , $0<\alpha<\beta<1$ . We assume a correct model specification, that is, we assume that there is a unique $\theta_{0}\in\Theta$ such that

[TABLE]

where $F_{Y_{t}|X_{t}}$ denotes the conditional distribution of $Y_{t}$ given $X_{t}$ . That means, $M(X_{t},\theta_{0})$ models jointly the conditional $\operatorname{VaR}_{\alpha}$ , $\operatorname{VaR}_{\beta}$ and the conditional $\operatorname{RVaR}_{\alpha,\beta}$ . Let $S$ be a strictly consistent scoring function of the form (3.3) and suppose the sequence $(X_{t},Y_{t})_{t\in\mathbb{N}}$ satisfies certain mixing conditions (White, 2001, Corollary 3.48) (in particular under independence). Then one obtains under additional moment conditions that, as $n\to\infty$ ,

[TABLE]

It is essentially this Law of Large Numbers result which allows for consistent parameter estimation with the empirical $M$ -estimator $\widehat{\theta}_{n}=\operatorname*{arg\,min}_{\theta_{\in}\Theta}n^{-1}\sum_{t=1}^{n}S(M(X_{t},\theta),Y_{t})$ ; see e.g. van der Vaart (1998), Huber and Ronchetti (2009), Nolde and Ziegel (2017) and Dimitriadis et al. (2020) for details.

In summary, we can see that the complication of this procedure is that one needs to model the components $\operatorname{VaR}_{\alpha}$ , $\operatorname{VaR}_{\beta}$ , even if one is only interested in $\operatorname{RVaR}_{\alpha,\beta}$ . The advantage is that one can substantially deviate from an i.i.d. assumption on the data generating process. One can deal with serially dependent, though mixing, and non-stationary data. One only needs the semiparametric stationarity specified through (7.1).

7.2 Trimmed least squares

Most proposals for $M$ -estimation and regression for $\operatorname{RVaR}_{\alpha,\beta}$ in the field of robust statistics focus on the $\alpha$ -trimmed mean, $\alpha\in(0,1/2)$ , corresponding to $\operatorname{RVaR}_{\alpha,1-\alpha}$ . But they can often be extended to the general case $0<\alpha<\beta<1$ in a straightforward way. When this is the case, we describe the procedure in this more general manner. A majority of the proposals in the literature are commonly referred to as a trimmed least squares (TLS) approach. However, strictly speaking, TLS actually subsumes different, though closely related estimation procedures.

The first one was coined by Koenker and Basset (1978) — cf. Ruppert and Carroll (1980) — and constitutes a two-step $M$ -estimator: In a first step, the $\alpha$ - and $\beta$ -quantile are determined via usual $M$ -estimation. Then, all values below the former and above the latter are omitted and $\operatorname{RVaR}_{\alpha,\beta}$ is computed with an ordinary least squares approach. One can also express this procedure using order-statistics. Using the notation from Subsection 7.1, an $M$ -estimator for $\operatorname{RVaR}_{\alpha,\beta}$ is given by $\operatorname*{arg\,min}_{z\in\mathbb{R}}\frac{1}{n}\sum_{i=[n\alpha]}^{[n\beta]}(z-Y_{(i)})^{2}.$ Here, $Y_{(1)}\leq\cdots\leq Y_{(n)}$ is the order-statistics of the sample $Y_{1},\ldots,Y_{n}$ . While this procedure seems to work for a simplistic regression model (ignoring the regressors $X_{t}$ and only modelling the intercept part), it is not clear how to use it in a more interesting regression context, where one is actually interested in the conditional distribution of $Y_{t}$ given $X_{t}$ rather than the unconditional distribution of $Y_{t}$ . Moreover, since this approach uses the order statistics of the entire sample $Y_{1},\ldots,Y_{n}$ to implicitly estimate the $\alpha$ - and $\beta$ -quantile, it requires that these quantiles be constant in time. Hence, heteroscedasticity (in time) can lead to problems, even if $\operatorname{RVaR}_{\alpha,\beta}$ is constant in time.

A second approach is described, for example, in Rousseeuw (1984, 1985) and relies on order-statistics of the squared residuals. It only seems to work for the $\alpha$ -trimmed mean. To be more precise, and again using the notation from above, let $m\colon\mathbb{R}^{d}\times\Theta\to\mathbb{R}$ be a one-dimensional parametric model. Again, one assumes that there is a unique correctly specified model parameter $\theta_{0}\in\Theta$ such that

[TABLE]

For each $\theta\in\Theta$ , define the residuals $\varepsilon_{t}(\theta):=Y_{t}-m(X_{t},\theta)$ and the absolute residuals $r_{t}(\theta):=|\varepsilon_{t}(\theta)|$ . Define the order-statistics of the absolute residuals $0\leq r_{(1)}(\theta)\leq\cdots\leq r_{(n)}(\theta)$ for a sample of size $n$ . Then an $M$ -estimator is defined via

[TABLE]

While this procedure appears to be fairly similar to an ordinary least squares procedure with the respective computational advantages, one should recall that the trimming crucially depends on the choice of the parameter $\theta$ . That means even if the model $m$ is linear in the parameter $\theta$ , one generally yields a non-convex objective function with several local minima. Interestingly, the trimming takes place only for residuals with large modulus. If the error distribution is symmetric, this procedure yields a consistent estimator for $\theta_{0}$ in an i.i.d. setting. If one wants to relax the assumption on the error distribution and is interested in modelling $\operatorname{RVaR}_{\alpha,\beta}$ for general $0<\alpha<\beta<1$ in (7.2), one could come up with the following ad-hoc procedure: Consider the order-statistics of the residuals $\varepsilon_{(1)}(\theta)\leq\cdots\leq\varepsilon_{(n)}(\theta)$ . Then define an $M$ -estimator via

[TABLE]

This procedure takes into account the asymmetric nature of trimming when dealing with $\beta\neq 1-\alpha$ or $\beta=1-\alpha$ and an asymmetric error distribution. However, as outlined above, this procedure can lead to problems in the presence of heteroscedasticity or general non-stationarity of the error distribution, if the conditional $\operatorname{VaR}_{\alpha}$ and $\operatorname{VaR}_{\beta}$ of $Y_{t}$ given $X_{t}$ depends on $X_{t}$ . We would like to point out that, at the cost of additionally modelling the $\alpha$ - and $\beta$ -quantile, the procedure using our strictly consistent scoring functions for the triplet $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$ described in Subsection 7.1 does not rely on the usage of order-statistics and it can in general deal with heteroscedasticity. The only degree of ‘stationarity’ is required through (7.1). Especially stationarity is deemed a too strong assumption in the context of financial data; see Davis (2016).

Finally, we would like to remark that there are further procedures belonging to the field of TLS. For instance, Atkinson and Cheng (1999) propose an adaptive procedure where the trimming parameter is data driven; see also Cerioli et al. (2018). However, we see no apparent way how to use such procedures if one is interested in predefined trimming parameters $\alpha$ and $\beta$ .

7.3 Connections to Huber loss and Huber skipped mean

In his seminal paper, Huber (1964) introduced the famous Huber loss $S(x,y)=\rho(x-y)$ where $\rho(t)=\frac{1}{2}t^{2}$ for $|t|\leq k$ and $\rho(t)=k|t|-\frac{1}{2}k^{2}$ for $|t|>k$ . Huber argues that the “the corresponding [M-]estimator is related to Winsorizing” (Huber, 1964, p. 79). What obtained significantly less attention — maybe due to its lack of convexity — is another loss function he considers on the same page of the paper which is defined as $S(x,y)=\rho(x-y)$ for $\rho(t)=\frac{1}{2}t^{2}$ for $|t|\leq k$ and $\rho(t)=\frac{1}{2}k^{2}$ for $|t|>k$ . He writes about it: “the corresponding [M-]estimator is a trimmed mean” (ibidem).

One could define an asymmetric version of the latter loss function by using $S_{k_{1},k_{2}}(x,y)=\rho_{k_{1},k_{2}}(x-y)$ with

[TABLE]

Assuming that $F$ is continuous with density $f$ for the sake of the simplicity of the argument, the corresponding first-order condition for a minimum of the expected score $\bar{S}_{k_{1},k_{2}}(x,F)$ is equivalent with

[TABLE]

Now a suggestion similar to Rousseeuw (1984, p. 876) is to consider this loss with $k_{1}=\operatorname{VaR}_{\beta}(F)$ and $k_{2}=\operatorname{VaR}_{\alpha}(F)$ stemming from some pre-estimate. However, one can see that the first order-condition is generally not solved by $\operatorname{RVaR}_{\alpha,\beta}(F)$ . Again, if one is interested in $M$ -estimation for the trimmed mean or, more generally, RVaR, one should use the scoring functions introduced at (3.3).

Acknowledgements

We would like to thank Timo Dimitriadis and Anthony C. Atkinson for insightful discussions about the topic, and Ruodu Wang, Rafael Frongillo, Tilmann Gneiting and Jana Hlavinová for helpful suggestions which improved an earlier version of this paper.

Tobias Fissler is grateful to the Department of Mathematics at Imperial College London who funded his fellowship during which most of the work of this paper has been done. Johanna Ziegel is grateful for financial support from the Swiss National Science Foundation.

Appendix

We present a list of assumptions used in Section 3. For more details about their interpretations and implications, please see Fissler and Ziegel (2016) where they were originally introduced.

Assumption (V1).

$\mathcal{F}$ is convex and for every $x\in\operatorname{int}(\mathsf{A})$ there are $F_{1},\ldots,F_{k+1}\in\mathcal{F}$ such that $0\in\operatorname{int}\left(\operatorname{conv}\left(\left\{\bar{V}(x,F_{1}),\ldots,\bar{V}(x,F_{k+1})\right\}\right)\right)\,.$

Note that if $V\colon\mathsf{A}\times\mathbb{R}\to\mathbb{R}^{k}$ is a strict $\mathcal{F}$ -identification function for $T\colon\mathcal{F}\to\mathsf{A}$ which satisfies Assumption (V1), then for each $x\in\operatorname{int}(\mathsf{A})$ there is an $F\in\mathcal{F}$ such that $T(F)=x$ .

Assumption (V3).

The map $\bar{V}(\cdot,F)$ is continuously differentiable for every $F\in\mathcal{F}$ .

Assumption (V4).

Let assumption (V3) hold. For all $r\in\{1,\ldots,k\}$ and for all $t\in\operatorname{int}(\mathsf{A})\cap T(\mathcal{F})$ there are $F_{1},F_{2}\in T^{-1}(\{t\})$ such that

[TABLE]

Assumption (F1).

For every $y\in\mathbb{R}$ there exists a sequence $(F_{n})_{n\in\mathbb{N}}$ of distributions $F_{n}\in\mathcal{F}$ that converges weakly to the Dirac-measure $\delta_{y}$ such that the support of $F_{n}$ is contained in a compact set $K$ for all $n$ .

Assumption (VS1).

Suppose that the complement of the set

[TABLE]

has $(k+d)$ -dimensional Lebesgue measure zero.

Assumption (S2).

For every $F\in\mathcal{F}$ , the function $\bar{S}(\cdot,F)$ is continuously differentiable and the gradient is locally Lipschitz continuous. Furthermore, $\bar{S}(\cdot,F)$ is twice continuously differentiable at $t=T(F)\in\operatorname{int}(\mathsf{A})$ .

Bibliography56

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Acerbi and Székely (2014) C. Acerbi and B. Székely. Backtesting Expected Shortfall. Risk Magazine , 2014.
2Acerbi and Székely (2017) C. Acerbi and B. Székely. General properties of backtestable statistics. Preprint , 2017. URL https://papers.ssrn.com/sol 3/papers.cfm?abstract_id=2905109 .
3Artzner et al. (1999) P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Math. Finance , 9:203–228, 1999.
4Atkinson and Cheng (1999) A. C. Atkinson and T.-C. Cheng. Computing least trimmed squares regression with the forward search. Statist. Comput. , 9(4):251–263, 1999.
5Bank for International Settlements (2014) Bank for International Settlements. Consultative Document: Fundamental review of the trading book: Outstanding issues . 2014.
6Barendse (2020) S. Barendse. Efficiently Weighted Estimation of Tail and Interquartile Expectations. Preprint , 2020. URL https://dx.doi.org/10.2139/ssrn.2937665 . · doi ↗
7Brehmer (2017) J. R. Brehmer. Elicitability and its application in risk management. Master’s thesis, University of Mannheim, 2017. URL http://arxiv.org/abs/1707.09604 .
8Cerioli et al. (2018) A. Cerioli, M. Riani, A. C. Atkinson, and A. Corbellini. The power of monitoring: how to make the most of a contaminated multivariate sample. Stat. Methods Appl. , 27(4):559–587, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Evaluating Range Value at Risk Forecasts111An earlier version of this paper was circulated under the name Elicitability of Range Value at Risk.

Abstract

1 Introduction

2 Notation and Definitions

2.1 Definition of Range Value at Risk

Definition 2.1**.**

Definition 2.2**.**

Definition 2.3**.**

2.2 Elicitability and scoring functions

Definition 2.4**.**

Definition 2.5**.**

Definition 2.6**.**

3 Elicitability and identifiability results

Proposition 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

Remark 3.3**.**

Proposition 3.4**.**

Proof.

Theorem 3.5**.**

Proof.

Corollary 3.6**.**

Proof.

Remark 3.7**.**

4 Translation invariance and homogeneity

Proposition 4.1** (Translation invariance).**

Proof.

Proposition 4.2** (Homogeneity).**

Proof.

Remark 4.3**.**

5 Mixture representation of scoring functions

Theorem 5.1**.**

Proof.

6 Simulations

7 Implications for regression

7.1 A joint regression framework for (VaR⁡α,VaR⁡β,RVaR⁡α,β)(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})(VaRα​,VaRβ​,RVaRα,β​)

7.2 Trimmed least squares

7.3 Connections to Huber loss and Huber skipped mean

Acknowledgements

Appendix

Assumption (V1).

Assumption (V3).

Assumption (V4).

Assumption (F1).

Assumption (VS1).

Assumption (S2).

Definition 2.1.

Definition 2.2.

Definition 2.3.

Definition 2.4.

Definition 2.5.

Definition 2.6.

Proposition 3.1.

Theorem 3.2.

Remark 3.3.

Proposition 3.4.

Theorem 3.5.

Corollary 3.6.

Remark 3.7.

Proposition 4.1 (Translation invariance).

Proposition 4.2 (Homogeneity).

Remark 4.3.

Theorem 5.1.

7.1 A joint regression framework for $(\operatorname{VaR}_{\alpha},\operatorname{VaR}_{\beta},\operatorname{RVaR}_{\alpha,\beta})$