Evaluating Range Value at Risk Forecasts
Tobias Fissler, Johanna F. Ziegel

TL;DR
This paper investigates the statistical validation and backtesting of Range Value at Risk (RVaR) forecasts, proposing a new elicitable triplet model with two VaR components and characterizing its scoring functions.
Contribution
It introduces a triplet of RVaR with two VaR levels as an elicitable model and characterizes its strictly consistent scoring functions, advancing the validation of RVaR forecasts.
Findings
RVaR alone is not elicitable, but the triplet with two VaR levels is.
The class of strictly consistent scoring functions for the triplet is characterized.
Simulation studies illustrate the proposed approach.
Abstract
The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) -- a quantile -- and Expected Shortfall (ES) -- a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term 'backtesting' in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that…
| Scoring function | |
|---|---|
| 0 | 0 | 0 | 0 | |
| 0.304 | 0.406 | 0.417 | 0.624 | |
| 0 | 0 | 0 | 0 | |
| 1.000 | 1.000 | 1.000 | 1.000 | |
| 0 | 0 | 0 | 0 | |
| 0.999 | 0.998 | 0.992 | 0.998 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Risk and Portfolio Optimization · Financial Risk and Volatility Modeling
\addtokomafont
disposition
Evaluating Range Value at Risk Forecasts111An earlier version of this paper was circulated under the name Elicitability of Range Value at Risk.
Tobias Fissler WU Vienna University of Economics and Business, Department of Finance, Accounting and Statistics, Welthandelsplatz 1, 1020 Vienna, Austria, e-mail: [email protected]
Johanna F. Ziegel University of Bern, Department of Mathematics and Statistics, Institute of Mathematical Statistics and Actuarial Science, Alpeneggstrasse 22, 3012 Bern, Switzerland, e-mail: [email protected]
Abstract
Abstract. The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) — a quantile — and Expected Shortfall (ES) — a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term ‘backtesting’ in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable. Mitigating this negative result, this paper shows that a triplet of RVaR with two VaR components at different levels is elicitable. We characterise the class of strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares in robust regression.
Keywords: Backtesting; Consistency; Elicitability; Expected Shortfall; Interquantile expectation; Point forecasts; Robustness; Scoring functions; Trimmed mean; Value at Risk; Winsorized mean
MSC2020 classes: 62C99; 62G35; 62P05; 91G70
1 Introduction
In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure (Artzner et al., 1999) be best in (regulatory) practice. The debate mainly focused on the dichotomy between Value at Risk () on the one hand and Expected Shortfall () on the other hand, at some probability level (see Section 2 for definitions). Mirroring the historical joust between median and mean as centrality measures in classical statistics, , basically a quantile, is esteemed for its robustness, while , a tail expectation, is deemed attractive due to its sensitivity and the fact that it satisfies the axioms of a coherent risk measure (Artzner et al., 1999). We refer the reader to Embrechts et al. (2014) and Emmer et al. (2015) for comprehensive academic discussions, and to Bank for International Settlements (2014) for a regulatory perspective in banking.
Cont et al. (2010) considered the issue of statistical robustness of risk measure estimates in the sense of Hampel (1971). They showed that a risk measure cannot be both robust and coherent. As a compromise, they propose the risk measure ‘Range Value at Risk’, at probability levels . It is defined as the average of all with between and (see Section 2 for definitions). As limiting cases, one obtains and , which presents as a natural interpolation of and . Quantifying its robustness in terms of the breakdown point and following the arguments provided in Huber and Ronchetti (2009, p. 59), has a breakdown point of , placing it between the very robust (with a breakdown point of ) and the entirely non-robust (breakdown point 0). This means it is a robust — and hence, not coherent — risk measure, unless it degenerates to (or if ). Moreover, belongs to the wide class of distortion risk measures (Kusuoka, 2001). For further contributions to robustness in the context of risk measures, we refer the reader to Krätschmer et al. (2012, 2014), Kou et al. (2013), Embrechts et al. (2015) and Zähle (2016). Since the influential article Cont et al. (2010), RVaR has gained increasing attention in the risk management literature — see Embrechts et al. (2018a, b) for extensive studies — as well as in econometrics (Barendse, 2020) where RVaR sometimes has the alternative denomination Interquantile Expectation. For the symmetric case , is known under the term -trimmed mean in classical statistics and it constitutes an alternative to and interpolation of the mean and the median as centrality measures; see Lugosi and Mendelson (2019) for a recent study and a multivariate extension of the trimmed mean. It is closely connected to the -Winsorized mean, see (2.4).
How to evaluate the predictive performance of point forecasts, , for a statistical functional , such as the mean, median or a risk measure, of the (conditional) distribution of a quantity of interest, ? It is commonly measured in terms of the average realised score for some scoring or loss function , using the orientation the smaller the better. Consequently, the loss function should be strictly consistent for in that : Correct predictions are honoured and encouraged in the long run. E.g., the squared loss is consistent for the mean, and the absolute loss is consistent for the median. If a functional admits a strictly consistent score, it is called elicitable (Osband, 1985; Lambert et al., 2008; Gneiting, 2011). By definition, elicitable functionals allow for -estimation and have natural estimation paradigms in regression frameworks (Dimitriadis et al., 2020, Section 2), such as quantile regression (Koenker and Basset, 1978; Koenker, 2005) or expectile regression (Newey and Powell, 1987). Elicitability is crucial for meaningful forecast evaluation (Engelberg et al., 2009; Murphy and Daan, 1985; Gneiting, 2011). In the context of probabilistic forecasts with distributional forecasts or density forecasts , (strictly) consistent scoring functions are often referred to as (strictly) proper rules, such as the log-score (Gneiting and Raftery, 2007). In quantitative finance, and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention (Emmer et al., 2015; Ziegel, 2016; Davis, 2016). Especially, the role of elicitability for backtesting purposes has been highly debated (Gneiting, 2011; Acerbi and Székely, 2014, 2017). It has been clarified that elicitability is central for comparative backtesting (Fissler et al., 2016; Nolde and Ziegel, 2017).
Not all functionals are elicitable. Osband (1985) showed that an elicitable functional necessarily has convex level sets (CxLS): If for two distributions , then where , . Variance and ES generally do not have CxLS (Weber, 2006; Gneiting, 2011), therefore failing to be elicitable. The revelation principle (Osband, 1985; Gneiting, 2011) asserts that any bijection of an elicitable functional is elicitable. This implies that the pair (mean, variance) — being a bijection of the first two moments — is elicitable despite the variance failing to be elicitable. Similarly, Fissler and Ziegel (2016) showed that the pair is elicitable with the structural difference that the revelation principle is not applicable in this instance. This gave rise to the more general finding that the minimal expected score and its minimiser are always jointly elicitable (Brehmer, 2017; Frongillo and Kash, 2020).
Recently, Wang and Wei (2020, Theorem 5.3) showed that , , similarly to , fails to have the CxLS property, which rules out its elicitability. In contrast, they observe that the identity
[TABLE]
and the CxLS property of the pair implies the CxLS property of the triplet (Wang and Wei, 2020, Example 4.6), leading to the question whether this triplet is elicitable or not. Invoking the elicitability of , the identity at (1.1) and the revelation principle establishes the elicitability of the quadruples and . This approach has already been used in the context of regression in Barendse (2020).
A fortiori, we show that the triplet is elicitable (Theorem 3.2) under weak regularity conditions. Practically, opens the way to meaningful forecast performance comparison, and in particular comparative backtests, of this triplet, as well as to a regression framework. Theoretically, this shows that the elicitation complexity (Lambert et al., 2008; Frongillo and Kash, 2020) or elicitation order (Fissler and Ziegel, 2016) of ist at most 3. Moreover, requiring only VaR-forecasts besides the RVaR-forecast is particularly advantageous to additionally requiring an ES-forecasts since the triplet , , exists and is finite for any distribution , whereas and only exist if the (left) tail of the distribution is integrable. As is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, this advantage is important.
We would like to point out the structural difference between the elicitability result of provided in this paper and the one concerning in Fissler and Ziegel (2016) as well as the more general results of Frongillo and Kash (2020) and Brehmer (2017). While corresponds to the negative of a minimum of an expected score which is strictly consistent for , it turns out that can be represented as the difference of minima of expected strictly consistent scoring functions for and (Proposition 3.1). As a consequence, the class of strictly consistent scoring functions for the triplet turns out to be less flexible than the one for ; see Remark 3.7 for details. In particular, there is essentially no translation invariant or positively homogeneous scoring function which is strictly consistent for ; see Section 4.
The paper is organised as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results establishing the elicitability of the triplet (Theorems 3.2 and 3.5) and related findings are presented in Section 3. Section 4 shows that there are basically no strictly consistent scoring functions for which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of Ehm et al. (2016). This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of -estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure (Koenker and Basset, 1978; Ruppert and Carroll, 1980; Rousseeuw, 1984).
2 Notation and Definitions
2.1 Definition of Range Value at Risk
There are different sign conventions in the literature on risk measures. In this paper we use the following convention: If a random variable models the losses and gains, then positive values of represent gains and negative values of losses. Moreover, if is a risk measure, we assume that corresponds to the maximal amount of money one can withdraw such that the position is still acceptable. Hence, negative values of correspond to risky positions. In the sequel, let be the class of probability distribution functions on . Recall that the -quantile, of is defined as the set , where .
Definition 2.1**.**
Value at Risk of at level is defined as .
For any we introduce the following subclasses of :
[TABLE]
Definition 2.2**.**
Range Value at Risk of at levels is defined as
[TABLE]
The definition of RVaR implies that
[TABLE]
For and one obtains that (i) ; (ii) and it is finite if and only if ; and (iii) and it is finite if and only if . exists only if or . If has a finite first moment, then coincides with the first moment of . Provided that exists it holds that
[TABLE]
using the usual conventions , and . If then the correction terms in the second line of (2.3) vanish, yielding , which justifies an alternative name for RVaR, namely Interquantile Expectation.
Definition 2.3**.**
Expected Shortfall of at level is defined as
Hence, provided that are finite, one obtains the identity (1.1). If has a finite left tail () then one could use the right hand side of (1.1) as a definition of . However, in line with our discussion in the introduction, always exists and is finite for even if the right hand side of (1.1) is not defined.
Interestingly, Embrechts et al. (2018b, Theorem 2) establish that can be written as an inf-convolution of and at appropriate levels. This result amounts to a sup-convolution in our sign convention. Also note that our parametrisation of of differs from theirs.
For , corresponds to the -trimmed mean and has a close connection to the -Winsorized mean (Huber and Ronchetti, 2009, pp. 57–59) via
[TABLE]
2.2 Elicitability and scoring functions
Using the decision-theoretic framework of Fissler and Ziegel (2016) and Gneiting (2011), we introduce the following notation. Let be some generic subclass, and be an action domain. Whenever we consider a functional , we tacitly assume that is well-defined for all and is an element of . corresponds to the image . For any subset we denote with the largest open subset of . Moreover, denotes the convex hull of the set .
We say that a function is -integrable if it is measurable and for all . Similarly, a function is called -integrable if is -integrable for all . If is -integrable, we define the map , . If is sufficiently smooth in its first argument, we denote the th partial derivative of with .
Definition 2.4**.**
A map is an -consistent scoring function for if it is -integrable and if for all , . It is strictly -consistent for if it is consistent and if implies that for all and for all . A functional is elicitable on if it possesses a strictly -consistent scoring function.
Definition 2.5**.**
Two scoring function are equivalent if there is some and some such that for all . They are strongly equivalent if additionally .
This equivalence relation preserves (strict) consistency: If is (strictly) -consistent for and if is -integrable, then is also (strictly) -consistent for . Closely related to the concept of elicitability is the notion of identifiability.
Definition 2.6**.**
A map is an -identification function for if it is -integrable and if for all . It is a strict -identification function for if additionally implies that for all and for all . it is consistent and if implies that for all and for all . A functional is elicitable if it possesses a strictly -consistent scoring function. A functional is identifiable on if it possesses a strict -identification function.
In contrast to Gneiting (2011) we consider point-valued functionals only. For a recent comprehensive study on elicitability of set-valued functionals we refer to Fissler et al. (2020). For the sake of completeness, we list some assumptions used in Section 3 which were originally introduced in Fissler and Ziegel (2016) in the Appendix.
3 Elicitability and identifiability results
Wang and Wei (2020, Theorem 5.3) show that for , (and also the pairs and ) do not have CxLS on , the class of distributions with bounded and discrete support. Hence, invoking that CxLS are necessary for elicitability and identifiability, and the pairs and fail to be elicitable and identifiable on . Our novel contribution is that the triplet , however, is elicitable and identifiable, subject to mild conditions. We use the notation , and recall that is -consistent for if for all , and strictly -consistent if furthermore (Gneiting, 2011).
Proposition 3.1**.**
For the map
[TABLE]
is an -identification function for , which is strict on .
Proof.
The proof is standard, observing that
[TABLE]
which follows from the representation (2.3). ∎
The following theorem establishes a rich class of (strictly) consistent scoring functions for . By a priori assuming forecasts to be bounded with values in some cube , (with the tacit convention that if or ), the class gets even broader.
Theorem 3.2**.**
For , the map
[TABLE]
is an -consistent scoring function for if
- (i)
* is convex with subgradient ,* 2. (ii)
for all the functions
[TABLE]
are increasing, and 3. (iii)
* is -integrable for all .*
If moreover is strictly convex, and the functions at (3.4) and (3.5) are strictly increasing, then is strictly -consistent for .
Proof.
Let , and . Then, since is increasing, is -consistent for and it is strictly -consistent if is strictly increasing. Similar comments apply to the map . Hence,
[TABLE]
with a strict inequality under the conditions for strict consistency and if . Finally,
[TABLE]
since is convex. If is strictly convex and if , the inequality in (3.6) is strict. ∎
Remark 3.3**.**
Provided condition (iii) in Theorem 3.2 holds and if is strictly convex, and and strictly increasing then given in (3.3) is still strictly -consistent in the -component for general . That is, for
[TABLE]
Making use of (2.4) and the revelation principle (Osband, 1985; Gneiting, 2011; Fissler, 2017), Theorem 3.2 also provides a rich class of strictly consistent scoring function for , where is the -Winsorized mean. The following proposition is useful to construct examples; see Section 6.
Proposition 3.4**.**
Let be of the form (3.3) with a (strictly) convex and non-constant function , and functions , such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied. Then the following holds:
- (i)
The subgradient of is necessarily bounded and the one-sided derivatives of and are necessarily bounded from below. 2. (ii)
* is strongly equivalent to a scoring function of the form with a (strictly) convex function such that is bounded with , and strictly increasing functions , such that their one-sided derivatives are bounded from below by one and such that such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.2 is satisfied.*
Proof.
- (i)
The proof is similar to the one of Corollary 5.5 in Fissler and Ziegel (2016): Condition (ii) implies that for any with and it holds that
[TABLE]
Therefore, is bounded, and the one-sided derivative of is bounded from below by while the one-sided derivative of is bounded from below by . 2. (ii)
For any , if we replace with , with , and with in the formula (3.3) for , then does not change. Also is (strictly) convex if and only if is (strictly) convex. Furthermore, conditions (ii) and (iii) of Theorem 3.2 hold for , , if and only if they hold for , and . By part (i) of the proposition is bounded. Therefore, we can assume without loss of generality that , since is non-constant. Then the argument follows by setting .
∎
Invoking the inequality (2.2) the triplet can only attain values in the domain . Therefore, we call the maximal sensible action domain. Issuing forecasts for outside , thus violating (2.2) would be irrational, corresponding to, say, negative variance forecasts. Still, the scoring functions of the form (3.3) allow for the evaluation of forecasts violating (2.2). Striving for a necessary characterisation result of (strictly) consistent scoring functions for , it is immediate to realise that there is flexibility in since one could possibly set the score to infinity there and would still preserve (strict) consistency. Therefore, it is not astonishing that a necessary characterisation result works only on domains . The key to such a necessary characterisation is Osband’s principle (Fissler and Ziegel, 2016, Theorem 3.2) originating from the seminal dissertation of Osband (1985). Since it exploits a first-order condition of the minimisation of the expected score, the main assumptions of the result consist of smoothness assumptions on expected score as well as richness assumptions on the underlying class of distributions ; see Appendix for the detailed technical formulations and Fissler and Ziegel (2016) for a discussion of these conditions.
We introduce the class of distributions which are continuously differentiable and with a strictly positive derivative / density. (Clearly for any .) For any , we denote the projections on the th component by , . For any and , let .
Theorem 3.5**.**
Let , , , and let defined at (3.1). If Assumptions (V1), and (F1) hold and satisfies Assumption (V4), then any strictly -consistent scoring function for that satisfies assumptions (VS1) and (S2) is necessarily of the form (3.3) almost everywhere, where the functions , , , in (3.4) and (3.5) are strictly increasing and is strictly convex.
Proof.
First note that satisfies assumption (V3) on . Let with derivative and let . Then one obtains
[TABLE]
The partial derivatives of are given by , , , , , and and vanish for and . Applying Fissler and Ziegel (2016, Theorem 3.2) yields the existence of continuously differentiable functions , , such that for . Since we assume that is twice continuously differentiable for any , the second order partial derivatives need to commute. Let . Then is equivalent to This needs to hold for all . The variation in the densities implied by Assumption (V4) in combination with the surjectivity of yield that on . Similarly, evaluating and at yields Using again Assumption (V4) as well as the surjectivity of , this implies that So we are left with characterising for . Note that Assumption (V1) implies that for any there are two distributions such that and are linearly independent. Then, the requirement that
[TABLE]
for all and for all implies that . Starting with , implies that \partial_{1}h_{33}\bar{V}_{3}(x,F)=\big{(}\partial_{3}h_{11}(x)+h_{33}(x)/(\beta-\alpha)\big{)}\bar{V}_{1}(x,F). Again, Assumption (V1) implies that there are such that \big{(}\bar{V}_{1}(x,F_{1}),\bar{V}_{3}(x,F_{1})\big{)}^{\intercal} and \big{(}\bar{V}_{1}(x,F_{2}),\bar{V}_{3}(x,F_{2})\big{)}^{\intercal} are linearly independent. Hence, we obtain that and . With the same argumentation and starting from one can show that and . This means there exist functions , , , and some such that for any it holds that ,
[TABLE]
where , . Due to the fact that any component of is mixture-continuous222For convex a functional is called mixture-continuous if for any the map is continuous. and since is convex and surjective, the projection is an open interval. Hence, . Due to Assumptions (V3) and (S2), Fissler and Ziegel (2016, Theorem 3.2) implies that are locally Lipschitz continuous.
The above calculations imply that the Hessian of the expected score, , at its minimiser , is a diagonal matrix with entries , , and . As a second order condition must be positive semi-definite. Invoking the surjectivity of once again, this shows that . More to the point, invoking the continuous differentiability of the expected score and the fact that is strictly -consistent for one obtains that for any with and for any , , there exists an such that is negative for all , zero for and positive for all For , this means that for any with there is an such that has the same sign as for all . Therefore, for all . Using the surjectivity of and invoking a compactness argument, attains a 0 only finitely many times on any compact interval. Recall that is an open interval. Hence, it can be approximated by an increasing sequence of compact intervals. Therefore, is at most countable and therefore a Lebesgue null set. With similar arguments one can show that for any , the sets and are at most countable and therefore also Lebesgue null sets.
Finally, using Proposition 1 in Fissler and Ziegel (2020) (recognising that is locally bounded) one obtains that is almost everywhere of the form (3.3). Moreover, it holds almost everywhere that and for . Hence, is strictly convex and the functions at (3.4) and (3.5) are strictly increasing. ∎
Combining Theorems 3.2 and 3.5, one can show that the scoring functions given at (3.3) are essentially the only strictly consistent scoring functions for the triplet on the action domain .
Corollary 3.6**.**
Let for some . Under the conditions of Theorem 3.5, a scoring function is strictly -consistent for , , if and only if it is of the form (3.3) almost everywhere satisfying conditions (i), (ii), (iii). Moreover, the function is necessarily bounded.
Proof.
For the proof it suffices to show that for , defined in (3.4), (3.5) is not only increasing on for any but on . For , we have and . Let and with . If there is nothing to show. If however , then . This means that
[TABLE]
where the second inequality stems from the fact that is increasing. If the function is strictly increasing, then the first inequality is strict. The argument for works analogously. ∎
Remark 3.7**.**
Note the structural difference of Theorems 3.2 and 3.5 to Frongillo and Kash (2020, Theorem 1), Brehmer (2017, Proposition 4.14) and in particular Fissler and Ziegel (2016, Theorem 5.2 and Corollary 5.5). Our functional of interest, with , is not a minimum of an expected scoring function — or Bayes risk —, but a difference of minima of two scoring functions. Indeed, while , we have that
[TABLE]
This structural difference is reflected in the minus sign appearing at (3.4). In particular, it means that the functions and cannot identically vanish if we want to ensure strict consistency of , whereas the corresponding functions in Theorem 5.2 in Fissler and Ziegel (2016) may well be set to zero. Frongillo and Kash (2020, Theorem 2) generalises our results and presents an elicitability result of any linear combination of Bayes risks.
Concrete examples for choices of the functions , , and for the scoring function at (3.3) are given and discussed in Section 6.
4 Translation invariance and homogeneity
There are many choices for the functions , , and appearing in the formula for the scoring function at (3.3). Often, these choices can be limited by imposing secondary desirable criteria on . In this section we show that, unfortunately, standard criteria (Patton (2011); Nolde and Ziegel (2017); Fissler and Ziegel (2019)) such as translation invariance and positive homogeneity are not fruitful for RVaR.
If one is interested in scoring functions with an action domain of the form possessing the additional property of translation invariant score differences, the only sensible choice is , , amounting to the maximal action domain . Similarly, for scoring functions with positively homogeneous score differences, the most interesting choices for action domains are , or .
Proposition 4.1** (Translation invariance).**
Under the conditions of Theorem 3.5 there are no strictly -consistent scoring functions for on with translation invariant score differences.
Proof.
Using Theorem 3.5 any strictly -consistent scoring function for must be of the form (3.3) where in particular is strictly convex, twice differentiable, and is bounded. Assume that has translation invariant score differences. That means that the function ,
[TABLE]
vanishes. Then, for all and for all
[TABLE]
Therefore, needs to be constant. Since is convex and that means that with . But since , is unbounded, which is a contradiction. ∎
The proof of Proposition 4.1 closely follows the one of Proposition 4.10 in Fissler and Ziegel (2019). The fact that the latter assertion entails a positive result has the following background: The strictly consistent scoring function for given in Fissler and Ziegel (2019, Proposition 4.10) works only on a very restricted action domain. To guarantee strict consistency on such an action domain, one would need a refinement of Theorem 3.2 in the spirit of Fissler and Ziegel (2020, Proposition 2). However, since such a positive result on a quite restricted action domain is practically irrelevant, we dispense with such a refinement and only state the relevant negative result here.
Proposition 4.2** (Homogeneity).**
Under the conditions of Theorem 3.5 there are no strictly -consistent scoring functions for on with positively homogeneous score differences.
Proof.
Using Theorem 3.5 any strictly -consistent scoring function for must be of the form (3.3) where in particular is strictly convex, twice differentiable, and is bounded. Assume that has positively homogeneous score differences of some degree . That means that the function ,
[TABLE]
vanishes. Therefore, for all , for all and all
[TABLE]
For the sake of brevity, we only consider the case , the other cases being similar. Equation (4.1) implies that that for any . Due to the strict convexity of , we need that . However, for , and for , . Hence, cannot be bounded. ∎
Remark 4.3**.**
The negative result of Proposition 4.2 should be compared with the results of Theorem C.3 in Nolde and Ziegel (2017) characterising homogeneous strictly consistent scoring functions for the pair . Since they use a different sign convention for and than we do in this paper, their choice of the action domain corresponds to our choice . When interpreting as a risk measure, negative values of are the more interesting and relevant ones, using our sign convention. Inspecting the proof of Proposition 4.2 and of Proposition 3.4(i) one makes the following observation: For , Nolde and Ziegel (2017) state an impossibility result for their choice of action domain. In fact, the problem occurring in our context is that is not bounded from below. In Proposition 3.4 this property is implied by the fact that the function at (3.5) is increasing. And it is exactly such a condition that is also present for strictly consistent scoring functions for the pair ; see Theorem 5.2 in Fissler and Ziegel (2016). On the other hand, the complication for stems from the fact that is not bounded from above. This condition is related to the monotonicity of at (3.4). Such a condition is not present for strictly consistent scoring functions for the pair . Correspondingly, there can be homogeneous and strictly consistent scoring functions for for this pair (Nolde and Ziegel, 2017) while this is not possible for the triplet .
5 Mixture representation of scoring functions
When forecasts are compared and ranked with respect to consistent scoring functions, one has to be aware that in the presence of non-nested information sets, model mis-specification and/or finite samples, the ranking may depend on the chosen consistent scoring function (Patton, 2020). In the specific case of , the forecast ranking may depend on the specific choice for the functions , , and appearing in Theorem 3.2. A possible remedy to this problem is to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams as introduced by Ehm et al. (2016). Murphy diagrams are based on the fact that the class of all consistent scoring functions can be characterised as a class of mixtures of elementary scoring functions that depend on a low-dimensional parameter. The following theorem provides such a mixture representation for the scoring functions at (3.3). The applicability is illustrated in Section 6. Recall that .
Theorem 5.1**.**
Let . Any scoring function of the form (3.3) with chosen such that can be written as
[TABLE]
where
[TABLE]
and , are locally finite measures on and is a finite measure on . If puts positive mass on all open intervals, then is strictly consistent. Conversely, for any choice of measures with the above restrictions, we obtain a scoring function of the form (3.3).
Proof.
An increasing function can always be written as
[TABLE]
for some locally finite measure , and some , . The function is strictly increasing if and only if is strictly positive, i.e., it puts positive mass on all open non-empty intervals. Furthermore, the one-sided derivatives of are bounded below by if and only if for all Borel sets , where is the Lebesgue measure on .
Using the arguments from Proposition 3.4, it is no loss of generality to show the assertion for a score such that and the one-sided derivatives of , are bounded from below by .
Then, there is a measure on such that , which is strictly positive if and only if is strictly convex, such that for all for all , we have
[TABLE]
Using Fubini’s theorem, we find that
[TABLE]
Using (3.3), (5.2) and Proposition 3.4 it is straight forward to check that a scoring function of the form (3.3) can be written as in (5.1) with replaced by
[TABLE]
and locally finite measures , on instead of , such that for , and for all Borel sets , and the measure measure . We can write , , for some locally finite measures , . Integrating with respect to , we obtain the function , and analogously for . Using that yields the claim with
[TABLE]
which is equal to the formula given in the statement of the theorem. The scoring functions and are consistent for VaR at level and , respectively. The scoring function is of the form (3.3) with and , which renders it a consistent scoring function for . The converse statement follows by direct computations. ∎
6 Simulations
This simulation study illustrates the usage of consistent scoring functions for the triplet when comparing the predictive performances of different forecasts for this triplet, e.g., in the context of comparative backtests (Nolde and Ziegel, 2017). Due to the negative results in Section 4 it is challenging to suggest concrete examples for the choices of the functions , and in (3.3). In Table 1, we give some first suggestions. The scoring function is in the spirit of the Huber loss (Huber, 1964, p. 79). It is only strictly consistent on , but remains consistent for all of . We illustrate the discrimination ability of the suggested scoring functions with a slightly extended version of a simulation example of Gneiting et al. (2007) which has also been considered in Fissler et al. (2016).
We consider a data generating process given by , where and are mutually independent sequences of i.i.d. standard normal random variables. Suppose we have three different forecasters who provide point forecasts, aiming at correctly specifying of the (conditional) distribution of . The first forecaster has access to and uses the correct conditional distribution for prediction, that is, they predict
[TABLE]
for timepoint , where and denote the density and quantile function of the standard normal distribution, respectively. The second forecaster predicts , where , and and where is independent normally distributed noise with mean zero and variance . The third forecaster, , bases their predictions on the unconditional distribution of , that is . Therefore, the forecasts take the form
[TABLE]
It is clear that the first forecaster dominates the second and the third forecaster, that is, they will be preferred under any consistent scoring function. Indeed, invoking Holzmann and Eulert (2014), in case of the first and the second forecaster, the first one is ideal with respect to the information set , whereas the second one is based on the same information set but is not ideal. In case of the first and the third forecaster, both forecasters are ideal but the information set of the first forecaster, , is larger than the one of the third forecaster, which is the trivial -algebra. It will depend on the size of the variance whether the second or the third forecaster is preferred. Figures 1 and 2 provide Murphy diagrams of all forecasters computed from a sample of size , providing a good approximation of the population level. They are in line with our theoretical considerations above concerning the ranking of the three forecasts.
We compare the predictive performances using Diebold-Mariano tests (Diebold and Mariano, 1995) based on the scoring functions in Table 1. We consider samples of size and repeat our experiment 10’000 times. In the left panel of Table 2, we consider the case that where is a trimmed mean. We report the ratio of rejections of the null hypothesis that forecaster outperforms forecaster , , , evaluated in terms of the score at significance level . E.g., for , we consider the null hypothesis for all , or in short, . Analogously, in the right panel of Table 2, we consider the case that are both close to zero, that is, and , which is a setting that is relevant if is used as a risk measure. For the scoring function , we have experimented a bit with the values and and report the results for the choices that worked best in our experiments. A systematic study on how to choose these two parameters goes beyond the scope of the present paper.
For the situation of the left panel of Table 2 concerning , we can see that forecaster 1 (2) outperforms forecaster 3 with a power of 1 (almost 1) for all scoring functions used. For a comparison of forecaster 1 and forecaster 2, the situation is more interesting: While forecaster 1 outperforms forecaster 2 with regard to all scoring functions considered, the power of the tests (and the associated discrimination ability of the scoring functions) varies substantially. While leads to an empirical power of 0.304 for the null hypothesis , the score induces a power of 0.624 for the same null hypothesis. The situation described in the right panel of Table 2 considering the parameter choice and leads to a different situation. The tests employing , and have a similar power. In contrast, yields a considerably smaller power (0.393) for the null than the other scores ( for all cases). A more detailed study and comparison of other scoring functions and other situations is deferred to future work.
7 Implications for regression
After illustrating the usage of consistent scoring functions in forecast comparison and comparative backtesting in Section 6, we would like to outline how one can implement our results about the elicitability of the triplet , in a regression context. Then we would like to contrast our ansatz to other suggestions for regression of the -trimmed mean (which can be generalised to ). The most common alternative approaches in the literature on robust statistics are the trimmed least squares approach and a two-step estimation procedure using the Huber skipped mean.
7.1 A joint regression framework for
Let be a time series with the usual notation that denotes some real valued response variable and is a -dimensional vector of regressors. Let be some parameter space and a parametric model for , . We assume a correct model specification, that is, we assume that there is a unique such that
[TABLE]
where denotes the conditional distribution of given . That means, models jointly the conditional , and the conditional . Let be a strictly consistent scoring function of the form (3.3) and suppose the sequence satisfies certain mixing conditions (White, 2001, Corollary 3.48) (in particular under independence). Then one obtains under additional moment conditions that, as ,
[TABLE]
It is essentially this Law of Large Numbers result which allows for consistent parameter estimation with the empirical -estimator ; see e.g. van der Vaart (1998), Huber and Ronchetti (2009), Nolde and Ziegel (2017) and Dimitriadis et al. (2020) for details.
In summary, we can see that the complication of this procedure is that one needs to model the components , , even if one is only interested in . The advantage is that one can substantially deviate from an i.i.d. assumption on the data generating process. One can deal with serially dependent, though mixing, and non-stationary data. One only needs the semiparametric stationarity specified through (7.1).
7.2 Trimmed least squares
Most proposals for -estimation and regression for in the field of robust statistics focus on the -trimmed mean, , corresponding to . But they can often be extended to the general case in a straightforward way. When this is the case, we describe the procedure in this more general manner. A majority of the proposals in the literature are commonly referred to as a trimmed least squares (TLS) approach. However, strictly speaking, TLS actually subsumes different, though closely related estimation procedures.
The first one was coined by Koenker and Basset (1978) — cf. Ruppert and Carroll (1980) — and constitutes a two-step -estimator: In a first step, the - and -quantile are determined via usual -estimation. Then, all values below the former and above the latter are omitted and is computed with an ordinary least squares approach. One can also express this procedure using order-statistics. Using the notation from Subsection 7.1, an -estimator for is given by Here, is the order-statistics of the sample . While this procedure seems to work for a simplistic regression model (ignoring the regressors and only modelling the intercept part), it is not clear how to use it in a more interesting regression context, where one is actually interested in the conditional distribution of given rather than the unconditional distribution of . Moreover, since this approach uses the order statistics of the entire sample to implicitly estimate the - and -quantile, it requires that these quantiles be constant in time. Hence, heteroscedasticity (in time) can lead to problems, even if is constant in time.
A second approach is described, for example, in Rousseeuw (1984, 1985) and relies on order-statistics of the squared residuals. It only seems to work for the -trimmed mean. To be more precise, and again using the notation from above, let be a one-dimensional parametric model. Again, one assumes that there is a unique correctly specified model parameter such that
[TABLE]
For each , define the residuals and the absolute residuals . Define the order-statistics of the absolute residuals for a sample of size . Then an -estimator is defined via
[TABLE]
While this procedure appears to be fairly similar to an ordinary least squares procedure with the respective computational advantages, one should recall that the trimming crucially depends on the choice of the parameter . That means even if the model is linear in the parameter , one generally yields a non-convex objective function with several local minima. Interestingly, the trimming takes place only for residuals with large modulus. If the error distribution is symmetric, this procedure yields a consistent estimator for in an i.i.d. setting. If one wants to relax the assumption on the error distribution and is interested in modelling for general in (7.2), one could come up with the following ad-hoc procedure: Consider the order-statistics of the residuals . Then define an -estimator via
[TABLE]
This procedure takes into account the asymmetric nature of trimming when dealing with or and an asymmetric error distribution. However, as outlined above, this procedure can lead to problems in the presence of heteroscedasticity or general non-stationarity of the error distribution, if the conditional and of given depends on . We would like to point out that, at the cost of additionally modelling the - and -quantile, the procedure using our strictly consistent scoring functions for the triplet described in Subsection 7.1 does not rely on the usage of order-statistics and it can in general deal with heteroscedasticity. The only degree of ‘stationarity’ is required through (7.1). Especially stationarity is deemed a too strong assumption in the context of financial data; see Davis (2016).
Finally, we would like to remark that there are further procedures belonging to the field of TLS. For instance, Atkinson and Cheng (1999) propose an adaptive procedure where the trimming parameter is data driven; see also Cerioli et al. (2018). However, we see no apparent way how to use such procedures if one is interested in predefined trimming parameters and .
7.3 Connections to Huber loss and Huber skipped mean
In his seminal paper, Huber (1964) introduced the famous Huber loss where for and for . Huber argues that the “the corresponding [M-]estimator is related to Winsorizing” (Huber, 1964, p. 79). What obtained significantly less attention — maybe due to its lack of convexity — is another loss function he considers on the same page of the paper which is defined as for for and for . He writes about it: “the corresponding [M-]estimator is a trimmed mean” (ibidem).
One could define an asymmetric version of the latter loss function by using with
[TABLE]
Assuming that is continuous with density for the sake of the simplicity of the argument, the corresponding first-order condition for a minimum of the expected score is equivalent with
[TABLE]
Now a suggestion similar to Rousseeuw (1984, p. 876) is to consider this loss with and stemming from some pre-estimate. However, one can see that the first order-condition is generally not solved by . Again, if one is interested in -estimation for the trimmed mean or, more generally, RVaR, one should use the scoring functions introduced at (3.3).
Acknowledgements
We would like to thank Timo Dimitriadis and Anthony C. Atkinson for insightful discussions about the topic, and Ruodu Wang, Rafael Frongillo, Tilmann Gneiting and Jana Hlavinová for helpful suggestions which improved an earlier version of this paper.
Tobias Fissler is grateful to the Department of Mathematics at Imperial College London who funded his fellowship during which most of the work of this paper has been done. Johanna Ziegel is grateful for financial support from the Swiss National Science Foundation.
Appendix
We present a list of assumptions used in Section 3. For more details about their interpretations and implications, please see Fissler and Ziegel (2016) where they were originally introduced.
Assumption (V1).
is convex and for every there are such that
Note that if is a strict -identification function for which satisfies Assumption (V1), then for each there is an such that .
Assumption (V3).
The map is continuously differentiable for every .
Assumption (V4).
Let assumption (V3) hold. For all and for all there are such that
[TABLE]
Assumption (F1).
For every there exists a sequence of distributions that converges weakly to the Dirac-measure such that the support of is contained in a compact set for all .
Assumption (VS1).
Suppose that the complement of the set
[TABLE]
has -dimensional Lebesgue measure zero.
Assumption (S2).
For every , the function is continuously differentiable and the gradient is locally Lipschitz continuous. Furthermore, is twice continuously differentiable at .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Acerbi and Székely (2014) C. Acerbi and B. Székely. Backtesting Expected Shortfall. Risk Magazine , 2014.
- 2Acerbi and Székely (2017) C. Acerbi and B. Székely. General properties of backtestable statistics. Preprint , 2017. URL https://papers.ssrn.com/sol 3/papers.cfm?abstract_id=2905109 .
- 3Artzner et al. (1999) P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Math. Finance , 9:203–228, 1999.
- 4Atkinson and Cheng (1999) A. C. Atkinson and T.-C. Cheng. Computing least trimmed squares regression with the forward search. Statist. Comput. , 9(4):251–263, 1999.
- 5Bank for International Settlements (2014) Bank for International Settlements. Consultative Document: Fundamental review of the trading book: Outstanding issues . 2014.
- 6Barendse (2020) S. Barendse. Efficiently Weighted Estimation of Tail and Interquartile Expectations. Preprint , 2020. URL https://dx.doi.org/10.2139/ssrn.2937665 . · doi ↗
- 7Brehmer (2017) J. R. Brehmer. Elicitability and its application in risk management. Master’s thesis, University of Mannheim, 2017. URL http://arxiv.org/abs/1707.09604 .
- 8Cerioli et al. (2018) A. Cerioli, M. Riani, A. C. Atkinson, and A. Corbellini. The power of monitoring: how to make the most of a contaminated multivariate sample. Stat. Methods Appl. , 27(4):559–587, 2018.
