On shrinkage estimation for balanced loss functions
\'Eric Marchand, William E. Strawderman

TL;DR
This paper develops shrinkage estimators for multivariate means under modified balanced loss functions, demonstrating dominance over standard estimators for certain distributions and loss functions, with implications for robustness.
Contribution
It introduces Baranchik-type estimators that dominate benchmarks under new balanced loss functions with concave, completely monotone $ ho$ and $ ho$-like functions.
Findings
Proposed estimators dominate the benchmark in normal and scale mixture models.
Results extend dominance to a class of concave, completely monotone loss functions.
Implications for robustness and simultaneous dominance in multivariate estimation.
Abstract
The estimation of a multivariate mean is considered under natural modifications of balanced loss function of the form: (i) , and (ii) , where is a target estimator of . After briefly reviewing known results for original balanced loss with identity or , we provide, for increasing and concave and which also satisfy a completely monotone property, Baranchik-type estimators of which dominate the benchmark for either distributed as multivariate normal or as a scale mixture of normals. Implications are given with respect to model robustness and simultaneous dominance with respect to either or $\ell
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On shrinkage estimation for balanced loss functions 111
Éric Marchanda & William E. Strawdermanb
*a Université de Sherbrooke, Département de mathématiques, Sherbrooke Qc, CANADA, J1K 2R1 (e-mail: [email protected]) *
*b Rutgers University, Department of Statistics, 501 Hill Center, Busch Campus, Piscataway, N.J., USA, 08855 (e-mail: [email protected]) *
Summary
The estimation of a multivariate mean is considered under natural modifications of balanced loss function of the form: (i) , and (ii) , where is a target estimator of . After briefly reviewing known results for original balanced loss with identity or , we provide, for increasing and concave and which also satisfy a completely monotone property, Baranchik-type estimators of which dominate the benchmark for either distributed as multivariate normal or as a scale mixture of normals. Implications are given with respect to model robustness and simultaneous dominance with respect to either or .
AMS 2010 subject classifications: 62F10, 62J07 (primary); 62C15, 62C20 (secondary).
Keywords and phrases: Balanced loss; Concave loss; Dominance; Multivariate normal; Scale mixture of normals; Shrinkage estimation.
1 Introduction
Balanced loss functions and their role in estimation have captured the interest of many researchers over the years since Arnold Zellner (Zellner, 1994) proposed their use in a regression framework. Balanced loss functions are appealing as they combine proximity of a given estimator to both a target estimator and the unknown parameter which is being estimated. They relate conceptually to methods for combining estimators (e.g., Judge & Mittlehammer, 2004), as well as penalized least-squares estimation. The study of balanced loss functions has frequently been cast in a regression framework (e.g., Hu & Peng, 2011, and the references therein), but it also has arisen or related to credibility theory, finance, sequential estimation, etc (Baran & Stepień-Baran, 2013; Zhang & Chen, 2018). In Zellner’s framework, the target estimator was least-squares, but such a target can be viewed more broadly (e.g., Jafari Jozani et al., 2006, 2014).
To a large extent, findings in the literature relate to balanced squared error loss
[TABLE]
where for an observable , , is a target estimator of , is the weight given to the proximity of to , and is a given estimator of . In such cases, as presented by Jafari Jozani et al. (2006), as well as Dey et al. (1999), Bayesian estimation as well as the frequentist risk performance under balanced loss with relate precisely to corresponding features under unbalanced loss (i.e., squared error loss) (see Theorem 2.2). For instance, given a prior and a corresponding Bayes estimator under loss , the corresponding Bayes estimator under balanced loss is simply given by . Such relationships are reviewed and briefly illustrated in Section 2.
In contrast, much less is known for the following two natural alternatives or modifications to loss (1.1):
[TABLE]
and
[TABLE]
with , , and . Balanced loss functions of the type (1.2) were considered by Jafari et al. (2012). They provided Bayesian estimators as well as other type of posterior risk analysis. However, for both losses (1.2) and (1.3), there seems to be no significant known finding for frequentist risk analysis, such as the earlier results for balanced squared error loss.
The objective of this paper is to try to fill such gaps. To achieve this, we focus on the multivariate normal case , as well as scale mixture of normals as defined in (2.4), the target estimator ; and the objective of improving on . The latter is the maximum likelihood estimator and also is minimax for losses (1.2) as elaborated upon at the outset of Section 3C. We obtain various sufficient conditions for dominance for both losses (1.2) and (1.3). These apply for interesting subclasses of concave ’s and ’s respectively, which are also completely monotone. Shrinkage estimation for multivariate normal models, and more generally spherically symmetric and elliptically symmetric models, has had a long, rich and influential history (e.g., Fourdrinier et al., 2018). The use of a concave loss as well as concave versions of (1.2) and (1.3), is quite appealing, and has motivated previous shrinkage estimation work such as Brandwein & Strawderman (1980, 1991), Brandwein et al. (1993), and Kubokawa et al. (2015), among others.
The paper is organized as follows. We collect some preliminary definitions and results in Section 2.1, before reviewing and illustrating frequentist risk and Bayesian analysis results in Section 2.2 applicable to balanced squared-error loss . In Sections 3 and 4, we provide conditions for a Baranchik-type estimator to dominate under loss functions (1.2) and (1.3) respectively (i.e., Theorems 3.3 and 4.4). In both cases, the proofs are unified with respect to choice of model and loss, the former with respect to the underlying normal mixture and the latter with respect to the choice of or for the balanced loss. Implications are given in terms of robustness and simultaneous dominance (i.e., Corollary 4.3). Finally, we make use of various techniques and properties relative to concave functions, completely monotone functions, superharmonic functions, and spherically symmetric distributions.
2 Preliminary results and the balanced squared-error loss case
2.1 Preliminary definitions and properties
We assemble here some definitions and properties useful throughout the manuscript. The estimators studied below are based on spherically symmetric distributions , which are scale mixtures of normals. Such distributions admit the representation
[TABLE]
and include many familiar examples such as Normal, Student, Logistic, Laplace, Exponential power (with ), among others. Other than moment finiteness conditions and the restriction to or dimensions, the applicability of our dominance findings will not require any further specific assumptions on .
A key characterization and property, which brings into play completely monotone functions, is given by the following result (see, e.g., Feller, 1966; Berger, 1975; etc.).
Lemma 2.1**.**
- (a)
A density of the form is a scale mixture of normals if and only if is completely monotone, i.e., , for , and . 2. (b)
The product of two completely monotone functions is completely monotone.
The dominance findings of Sections 3 and 4 relate to Baranchik-type estimators of defined and denoted throughout as:
[TABLE]
with , and the conditions
[TABLE]
These include James-Stein estimators with constant , and in the original case.
2.2 Balanced squared-error loss
We review here, for Bayesian inference and frequentist risk analysis, relationships between balanced loss and its unbalanced counterpart . Such results appear in Dey et al. (1999), as well as in Jafari Jozani et al. (2006). For the former, the findings apply to a multivariate normal model and , while the latter work relates to a more general model and target estimator . Some of the results will serve in later sections, but they are exposed here also to illustrate the facility in which Bayesian analysis and frequentist risk evaluations for follow from corresponding results for squared-error loss .
The following Lemma 2.2 will be used in Section 4 for the analysis of losses in (1.3), but is presented here as it serves to link the frequentist risk under loss to the risk under squared error loss , as presented in Corollary 2.1. To facilitate the presentation that follows, we denote the difference in losses between estimates and as
[TABLE]
Lemma 2.2**.**
Let . For the problem of estimating under balanced loss (as in (1.1)), we have .
Proof. A decomposition of (2.4) yields
[TABLE]
In terms of the frequentist risk associated with loss , given by for an estimator of , the following general result follows from Lemma 2.2.
Corollary 2.1**.**
Let and consider the problem of estimating . The estimator dominates under loss if and only if dominates under squared error loss .
Proof. We have
[TABLE]
which establishes the result. ∎
Now, turning to Bayesian inference, we have an equally simple relationship between balanced loss and its unbalanced counterpart . More precisely, the following well-known result conveniently expresses the Bayes estimator under for in terms of the Bayes estimator under , given of course by .
Theorem 2.1**.**
For and a prior for which exists for all , the Bayes estimator of under loss is given by
Proof. Write for . By definition of the Bayes estimate, we thus have
[TABLE]
From this, the result follows as ∎
Now, combining the last two results leads to the following Bayes dominance result.
Corollary 2.2**.**
For , a prior , and the problem of estimating , the Bayes estimator dominates an estimator under if and only if the Bayes estimator dominates under squared-error loss .
Several examples can be found in the literature, namely among the references mentioned above. We do provide at the end of this subsection Example 2.1 as an illustration. Before doing so, we briefly address the issue of minimaxity, where relationships between balanced and unbalanced losses are not as immediate (e.g., Jafari Jozani et al., 2006). One situation though does simplify, namely the case where the target estimator is itself minimax under the unbalanced loss. Moreover, the following result (Jafari Jozani et al., 2012, Theorem 4) holds in general for losses (1.2). As discussed at the outset of Section 3C, this will serve to guarantee that the dominating estimators of Theorem 3.3 are themselves minimax.
Theorem 2.2**.**
Let and consider the problem of estimating under loss (1.2). Suppose that the estimator is minimax under unbalanced loss . Then, is also minimax under loss (1.2) for all .
Proof. Let denote the frequentist risk under loss (1.2). Since , the result is immediate. ∎
Example 2.1**.**
We consider the classical problem of estimating a multivariate normal mean and illustrate how known Stein estimation results applicable (e.g., Stein 1981; Strawderman, 2003) to squared-error loss translate to balanced loss . Let and be independently distributed with . Set and consider estimating under balanced loss with target estimator , which is minimax under , and thus minimax under loss by virtue of Theorem 2.2.
- (A)
For known , any estimator of the form such that is weakly differentiable, , and a.e., dominates under loss . It thus follows from Corollary 2.1 that dominates under loss for such ’s. Such dominating estimators include the James-Stein estimator with , as well as Baranchik type estimators in (2.5), with and conditions (2.6) on . 2. (B)
For unknown , with the same conditions on , estimators of the form dominate under loss . Again, it follows immediately from Corollary 2.1 that estimators dominate under balanced loss . 3. (C)
For known , Bayes estimators under balanced loss and associated with prior density , Theorem 2.1 along with a well-known representation for tell us that
[TABLE]
where is the marginal distribution of . By virtue of Corollary 2.2, the estimator dominates under loss if and only if dominates under loss . With the superharmonicity of either or a sufficient condition for to dominate under loss (e,g. Strawderman, 2003), we thus infer that either of these conditions imply that dominates under balanced loss .
3 Risk analysis for loss
A. The loss function
For a model (2.4), we evaluate the frequentist risk performance of an estimator of under the balanced loss
[TABLE]
which incorporates the target estimator . For the function , we assume the following throughout this section:
Assumption 1**.**
, and is completely monotone on , i.e., for and for .
Examples of loss functions for which satisfies Assumption 1, other than , include: (i) with , (ii) , (iii) with , and (iv) cases with being completely monotone such as with . Case (i) is known as reflected normal loss, while examples (iv) represent a broader class of bounded losses. losses with , , represent concave choices, but such ’s do not satisfy the finiteness assumption on .
B. Further technical results
We now expand on various technical results which are pivotal to the risk analysis in Subsection 3C.
Lemma 3.3**.**
Consider , admitting representation (2.4) with mixing variable , and satisfying Assumption 1. Let ; with . Then,
- (a)
The distribution of admits a scale mixture of normals representation
[TABLE] 2. (b)
Moreover, the distribution of is stochastically smaller than the distribution of .
Proof. First observe that is a density since . Part (a) thus follows from Lemma 2.1. For part (b), given that and are completely monotone, they are representable as Laplace transforms (Lemma 2.1):
[TABLE]
for . From this, we have for
[TABLE]
Interpreting in terms of scale mixture of normals, we have for representation (3.9) with . Finally, from this, we have for all and the result follows. ∎
The two lemmas that follow, which we will require, rely partly on properties of superharmonic functions. We recall that a continuous function is superharmonic if and only if: at all and , the average of over the surface of the sphere, centered at of radius , is less or equal than . For twice differentiable , the superharmonicity of is equivalent to its Laplacian being less or equal to [math], i.e., with .
Lemma 3.4**.**
Let with and let with and . Then, we have the following:
- (a)
* is decreasing in for ;* 2. (b)
* is non-increasing in provided that and is superharmonic.*
Proof. The proof of part (a) is relegated to an Appendix. For part (b), first denote as a random vector uniformly distributed on the sphere centered at [math] of radius . It suffices to show that is for all decreasing in . Since independently of , we have . Since, for a superharmonic function, the sphere mean is decreasing in the radius (see, e.g., Fourdrinier et al. 2018, Theorem 7.4), we infer that is decreasing in , which concludes the proof. ∎
Lemma 3.5**.**
Let , , and satisfy Assumption 1. Consider , as in (2.4) and Lemma 3.3, respectively.
- (a)
For , we have
[TABLE] 2. (b)
For and a twice-differentiable function that is non-decreasing and concave, we have
[TABLE]
Proof. (a) The first inequality follows from the inequality
[TABLE]
which holds since is concave with . The second inequality follows from Lemma 3.3 and part (a) of Lemma 3.4. Indeed, since , , we have, with the notation of Lemma 3.4, and , and the result follows since is decreasing and is stochastically smaller than .
(b) Defining and denoting , we have
[TABLE]
where (i) the two equalities follow from the scale mixture representations of and ; (ii) the first inequality follows since is non-decreasing and for , (iii) the second inequality follows from (3.10), and (iv) the third inequality follows from Lemma 3.3, part (b) of Lemma 3.4, as in the above proof of part (a), and from the fact that
[TABLE]
provided is non-negative, non-decreasing, and concave. Finally, to justify the above, note that, for twice-differentiable ,
[TABLE]
so that the choice yields with a little bit of computation
[TABLE]
since the properties of imply that and for all . ∎
C. Dominance results
For balanced loss with satisfying Assumption 1, a scale mixture of normals distribution on with , we provide James-Stein and Baranchick-type estimators that dominate . In such cases, it follows that is minimax for the unbalanced case with constant risk (e.g., Kubokawa et al., 2015). By virtue of Theorem 2.2, is also minimax for balanced loss . The following dominance results thus provide dominating estimators which are also minimax under loss .
Theorem 3.3**.**
Consider ; ; admitting representation (2.4), balanced loss function as in (3.8) with satisfying Assumption 1.
- (a)
If , dominates provided
[TABLE]
with , and the mixing variance for as defined in Lemma 3.3. An equivalent expression for the above dominance condition is
[TABLE] 2. (b)
If , a Baranchik-type estimator in (2.5) dominates provided (3.12) holds and provided satisfies conditions (2.6).
Proof. (a) First, the stated equivalence between (3.12) and (3.13) holds since, on one hand,
[TABLE]
(as when ) and, on the other hand,
[TABLE]
Second, we have for a difference in risks
[TABLE]
where the inequality follows from part (a) of Lemma 3.5 and the concave function inequality for all . Now, with representation (3.9), by conditioning on , and by the Stein’s identity and calculation (with probability ), we obtain
[TABLE]
By noticing that is increasing in , given that and distributions are stochastically increasing in , we infer from (3.14) and the covariance inequality (i.e., for increasing and decreasing) that
[TABLE]
From the above, it follows immediately that (3.12) is a sufficient condition for to be negative for all .
(b) The proof is similar to that of part (a). Using the concave function inequality , Stein’s identity, and part (a) of Lemma 3.5, we obtain for the difference in risk
[TABLE]
Now, it is easy to verify that is decreasing in under the given conditions on . Finally, an application of the covariance inequality leads to an inequality as in (3.15) with replaced by . The result then follows. ∎
Remark 3.1**.**
From inequality (3.15), it also follows that dominance occurs, in both parts (a) and (b) of Theorem 3.3 for the quantity equal to the upper cut-off point in (3.12) (or (3.13)) unless and is degenerate, i.e., original balanced loss and the multivariate normal case.
The proof of Theorem 3.3 is unified with respect to the choice of , the coefficient in the balanced loss, and the underlying scale mixture or normals distribution. To conclude, we point out that the above result can be seen as extensions of Kubokawa et al. (2015), as well as Strawderman (1974), whose results can be seen as particular cases of in the former case, and in the latter case.
4 Risk analysis for loss
The main dominance finding of this section (Theorem 4.4) relates to a multivariate normal , and more generally to distributed as a scale mixture of normals as in (2.4). We assess the frequentist risk performance of an estimator of under the balanced loss
[TABLE]
More specifically, we consider the target estimator and set , and our objective is to provide, for , estimators of that dominate under balanced loss (4.16) other than Section 2’s results for . For the function , we assume, unless stated otherwise, the following throughout this section:
Assumption 2**.**
, , and is completely monotone on , i.e., for and for .
Examples of losses with satisfying Assumption 2 include examples (i), (ii), (iii), (iv) given for in part B. of Section 3, but the cases are also included here since the assumption is not required.
We proceed with a preparatory lemma which exploits the concavity of , and which relates the difference in losses , between estimates and , to the balanced squared-error loss difference in (2.7). We therefore define
[TABLE]
and we have the following.
Lemma 4.6**.**
Let . For the problem of estimating under loss (4.16) with twice-differentiable, increasing, and concave , we have
[TABLE]
Proof. The proof uses the fact that , since is concave, with and . This yields :
[TABLE]
which is indeed (4.18), by virtue of Lemma 2.2 and since . ∎
A basic result for estimating a mean vector under quadratic loss, for scale mixtures of normal distributions is the following.
Lemma 4.7**.**
*(Strawderman, 1974)
Let have a scale mixture of normals distribution as in (2.4) with , and consider estimating with loss . Consider Baranchik-type estimators as in (2.5) with conditions (2.6). Then, dominates provided*
[TABLE]
and provided and are finite.
The main result of this section can now be presented and established.
Theorem 4.4**.**
Let have a scale mixture of normals distribution as in (2.4) with , and consider estimating with loss , as in (4.16) with , where satisfies Assumption 2. Consider Baranchik-type estimator as in (2.5) with conditions (2.6) on . Then, assuming that is a density on , dominates provided
[TABLE]
and provided and are finite, where the expectation is taken with respect to .
Proof. With the given notation, observe that with . Therefore, by Lemma 4.6 with and , we have for the difference in losses between and :
[TABLE]
Finally, since both and are completely monotone, so is the density (Lemma 2.1, part a). This implies that is a scale mixture of normals density and the result thus follows immediately from Lemma 4.7. ∎
Remark 4.2**.**
For the unbalanced case , one recovers Theorem 2.1 of Kubokawa et al. (2015). For the original balanced loss function with , one may recover the result of Theorem 4.4 directly by relying on Lemma 4.7 and Lemma 2.2, as illustrated in Example 2.1 for the multivariate normal case.
Moreover, it is interesting to compare the balanced and unbalanced cut-off points and . For loss with , , we have for all . For choices such that is non-decreasing in , we have a monotone likelihood ratio ordering for densities and , with the former being stochastically larger. This implies the ordering , and therefore . Examples where such a condition holds include: , , , with .
It is also interesting to assess how the upper cut-off point on the multiple for the estimator to dominate varies in terms of the model and the choice of for the loss function. In the former case, one can infer dominance results that are robust, holding for a given but also persisting for a class of departures from . This is quite plausible and simple to visualize as the cut-off point depends on only through the inverse moment . For the latter case, one can infer dominance results that hold simultaneously for a subclass of losses (4.16). Here is such an illustration.
Corollary 4.3**.**
Consider the context of Theorem 4.4, for a given loss , and a Baranchik-type estimator which satisfies the given requirements for dominance of . Then, the dominance persists for the original balanced loss with and .
Proof. It suffices to show that
[TABLE]
where the expectation is taken with respect to the density given in Theorem 4.4, and where we define the expectation as taken with respect to the density . Now observe that the ratio of these densities, proportional to is decreasing in by assumptions on . We thus have a monotone likelihood ratio in ordering between the densities and inequality (4.19) follows since is decreasing in . ∎
5 Concluding remarks
For a multivariate normal distributed , and more generally for a scale mixture of normals model , we have provided shrinkage estimators of that improve on the benchmark estimator as measured by the frequentist risk associated with balanced loss functions of the types (1.2) and (1.3), and with completely monotone and . Much of the approach is unified with respect to the choices of and either or and the findings represent analytical extensions to the original balanced loss with either identity or , unavailable up to now.
The findings in this paper do not cover cases with unknown scale such as observations generated from a with unknown , such as earlier results on the original balanced loss function (e.g., Chung et al. 1999; Zinodiny, 2014), but we expect that the techniques presented here should be useful to derive corresponding results for analogs of loss functions (1.2) and (1.3). Finally, it would be most interesting and welcomed to obtain Bayesian estimators that either satisfy our conditions of dominance, or dominate the benchmark under the set-up of Theorems 3.3 and 4.4.
Appendix
Proof of Lemma 3.4, part (a)
With and the Poisson representation of the non-central distribution (i.e., ), we have for and
[TABLE]
with . Since is increasing in for , since is decreasing in , and since the Poisson() distribution has increasing monotone likelihood ratio in with parameter , it follows from the above that is decreasing indeed in for .
Acknowledgements
Éric Marchand’s research is supported in part by the Natural Sciences and Engineering Research Council of Canada, and William Strawderman’s research is partially supported by a grant from the Simons Foundation (#418098).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[2] Baran, J. & Stepień-Baran, A. (2013). Sequential estimation of a location parameter and powers of a scale parameter from delayed observations. Statistica Neerlandica , 67 , 263–280.
- 3[3]
- 4[4] Berger, J.O. (1975). Minimax estimation of location vectors for a wide class of distributions. Annals of Statistics , 3 , 1318–1328.
- 5[5]
- 6[6] Brandwein, A.C., Ralescu, S. & Strawderman, W.E. (1993). Shrinkage estimation of the location parameters for certain spherically symmetric distributions. Annals of the Institute of Statistical Mathematics , 45 , 551–565.
- 7[7]
- 8[8] Brandwein, A.C. & Strawderman, W.E. (1991) Generalizations of James-Stein estimators under spherical symmetry. Annals of Statistics , 19 , 1639–1650.
