Uniform minimum risk equivariant estimates for moment condition models
Michel Broniatowski, Jana Jure\v{c}kov\'a, Amor Keziou

TL;DR
This paper studies invariant semiparametric models and shows that minimum empirical divergence estimates, including empirical likelihood, are equivariant, providing a way to identify the minimum risk equivariant estimate using conditional expectations.
Contribution
It introduces a novel approach to identify minimum risk equivariant estimates in moment condition models using empirical divergence methods.
Findings
Minimum empirical divergence estimates are equivariant.
The minimum risk equivariant estimate can be characterized via conditional expectations.
Asymptotic approximation of the conditional expectation is derived.
Abstract
We consider semiparametric moment condition models invariant to transformation groups. The parameter of interest is estimated by minimum empirical divergence approach, introduced by Broniatowski and Keziou (2012). It is shown that the minimum empirical divergence estimates, including the empirical likelihood one, are equivariants. The minimum risk equivariant estimate is then identied to be any one of the minimum empirical divergence estimates minus its expectation conditionally to maximal invariant statistic of the considered group of transformations. An asymptotic approximation to the conditional expectation, is obtained, using the result of Jureckov{\'a} and Picek (2009).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
Uniform minimum risk equivariant estimates for moment condition models
Michel Broniatowski1, Jana Jurečková2 and Amor KEZIOU3
1 LPSM, Sorbonne-Université, Paris, France
2The Czech Academy of Sciences, Institute of Information Theory and Automation, Prague, Czech Republic
3Laboratoire de Mathématiques de Reims, France
(Date: April, 2019)
Abstract.
We consider semiparametric moment condition models invariant to transformation groups. The parameter of interest is estimated by minimum empirical divergence approach, introduced by Broniatowski and Keziou (2012). It is shown that the minimum empirical divergence estimates, including the empirical likelihood one, are equivariants. The minimum risk equivariant estimate is then identified to be any one of the minimum empirical divergence estimates minus its expectation conditionally to maximal invariant statistic of the considered group of transformations. An asymptotic approximation to the conditional expectation, is obtained, using the result of Jurečková and Picek (2009).
AMS Subject Classifications:62F10, 62F30, 62F99
Keywords : Pitman estimator, semiparametric model, equivariant estimator
1. Introduction
The semiparametric moment condition models are defined through estimating equations
[TABLE]
where denotes the mathematical expectation, is a random vector, is the unknown true value of the parameter of interest which is assumed to be unique, and is some specified measurable -valued function defined on . Such models are popular in statistics and econometrics, see e.g., Qin and Lawless (1994), Haberman (1984), Sheehy (1987), McCullagh and Nelder (1983), Owen (2001) and the references therein. Denoting the probability distribution of the random vector , then the above estimating equations can be written as
[TABLE]
Let be the collection of all signed finite measures (s.f.m.) on the Borel -field such that . The submodel , associated to a given value , consists of all s.f.m.’s satisfying linear constraints induced by the vector valued function , namely,
[TABLE]
with . The statistical model which we consider can be written as
[TABLE]
Let be an i.i.d. sample of the random vector with unknown probability distribution . The problems of testing the model , confidence region and point estimations of , have been widely investigated in the literature. Hansen (1982) considered generalized method of moments (GMM) in order to estimate . Hansen et al. (1996) introduced the continuous updating (CU) estimate. Asymptotic confidence regions for the parameter have been obtained by Owen (1988) and Owen (1990), introducing the empirical likelihood (EL) approach. It has been used, in the context of model (1), by Qin and Lawless (1994) and Imbens (1997) introducing the EL estimate for the parameter . The recent literature in econometrics focusses on such models; Smith (1997), Newey and Smith (2004) provided a class of estimates called generalized empirical likelihood (GEL) estimates which contains the EL and the CU ones. Among other results pertaining to EL, Newey and Smith (2004) stated that EL estimate enjoys asymptotic optimality properties in term of efficiency when bias corrected among all GEL estimates including the GMM one. Broniatowski and Keziou (2012) proposed a general approach through empirical divergences and duality technique which includes the above methods in the general context of signed finite measures under moment condition models (1). These approach allows the asymptotic study of the estimates and associated test statistics both under the model and under misspecification, leading to new results, in particular, for the EL approach. Note that all the proposed estimates including the EL one are generally biased, and that the problem of their finite sample efficiency, at our knowledge, have not yet been studied.
The aim of the present paper is to investigate the finite-sample optimality property estimation in the context of semiparametric model (1). We will discuss the problem of constructing minimum risk equivariant estimates (MRE) for the parameter , as well as the problem of the numerical calculation of these estimates.
We recall in the following lines, for the above estimation problem, the notions of group transformations on the random vector space, model invariance and the induced group of transformations on the parameter space, loss invariance and equivariance estimation; we refer to the unpublished preprint of Hoff (2012) for an excellent presentation of the above notions, and the book of Lehmann and Casella (1998).
Let be a collection, of one-to-one transformations from the vector space in , which we assume to be a “group”, in the sense that, it should be closed under both composition and inversion, namely,
[TABLE]
The group can be extended to a group of transformations on the sample space, onto , which will be denoted , as follows
[TABLE]
We will consider two kinds of transformation groups,
“additive”
[TABLE]
where is some subset of , 2. or 3.
“multiplicative”
[TABLE]
where is diagonal matrix, with entries or with possibly some entries equal to one.
We assume that the model given in (1) is invariant under the considered group of transformations , in the sense that,
[TABLE]
The induced group of transformations on the parameter space, onto , denoted hereafter, will be defined as follows. Let be any transformation belonging to , and consider any random vector such that . Then, by identifiability assumption, there exists a unique such that . By invariance assumption (4), of the model to the group , the distribution belongs to . Therefore, there exists a unique (by indentifiability) such that . Denote then by the bijection induced by on the parameter space onto , defined by
[TABLE]
The induced group on the parameter space, onto , is then defined to be
[TABLE]
Two points are said equivalent iff for some The orbit , of a point , is defined to be the set of equivalent points:
[TABLE]
We will assume that there is only one orbit of , i.e.,
[TABLE]
which means that the group of transformation is rich enough allowing to go from any point in to another via some transformation . In such case, the group is said to be “transitive” over .
We give here some examples for illustration. In all the examples below, we can see that the group is transitive over .
Example 1**.**
Sometimes we have information relating the first and second moments of a random variable (see e.g. Godambe and Thompson (1989) and McCullagh and Nelder (1983)). Let be an i.i.d. sample of a random variable with mean , and assume that , where is a known function. Our aim is to estimate . The information about the distribution of can be expressed in the form of (1) by taking If we take the parameter space to be , then it is straightforward to see that the model is invariant to the additive group of transformations
[TABLE]
if for some , and invariant to the multiplicative group
[TABLE]
if for some .The induced groups on the parameter space are, respectively,
[TABLE]
and
[TABLE]
Example 2**.**
Let be an i.i.d. sample of a bivariate random vector with . In this case, we can take If we consider , then the model is invariant with respect to the groups
[TABLE]
or
[TABLE]
The induced groups on are, respectively,
[TABLE]
and
[TABLE]
A some what similar problem is when is known, and is to be estimated, by taking Such problems are common in survey sampling (see e.g. Kuk and Mak (1989) and Chen and Qin (1993)). Taking , the model is then invariant with respect to the groups
[TABLE]
or
[TABLE]
The induced groups on are, respectively,
[TABLE]
and
[TABLE]
Example 3**.**
Let be an i.i.d. sample of a random variable with distribution such that , where , . The known intervals may be bounded or unbounded, and are known nonnegative numbers. The information about can be written under the form of model (1) taking and . The model in this case is invariant to the groups
[TABLE]
or
[TABLE]
and the induced groups on the parameter space are, respectively,
[TABLE]
and
[TABLE]
Example 4**.**
Let be an i.i.d. sample of a random variable with continuous distribution such that , and , where is known and is to be estimated. Note that is the quantile of order of the variable , and that the variance of is assumed to be known and equal to one. This problem can be written under the form of model (1) taking and . The model in this case is invariant with respect to the additive group
[TABLE]
and the induced group on is
[TABLE]
Example 5**.**
Let be an i.i.d. sample of a random variable with continuous distribution such that and , where is known and is to be estimated. Note that is the quantile of order of the variable . This problem can be written under the form of model (1) taking and . The model in this case is invariant with respect to the multiplicative group
[TABLE]
and the induced group on is
[TABLE]
Example 6**.**
Let be an i.i.d. sample of a random vector with continuous distribution such that , where is some specified measurable function, and is to be estimated. We can consider also the case where some components of are known and that the other components are to be estimated. It is clear that the corresponding model defined in (1), taking and , is invariant to the additive group
[TABLE]
and the induced group on the parameter is
[TABLE]
Likewise, if the data are such that , where is some specified measurable function, and is to be estimated, then the corresponding model , taking , is invariant to the multiplicative group
[TABLE]
and the induced group on the parameter is
[TABLE]
In all the sequel, without loss of generality, we assume that the model and the group of transformation are such that
[TABLE]
Note that this assumption implies the condition (4) that the model is invariant under .
In all the following, when estimating by an estimate , we consider the quadratic loss function
[TABLE]
if the model is invariant with respect to additive group, and the loss function is taken to be relative quadratic
[TABLE]
if the model is invariant with respect to the multiplicative group.
Definition 7**.**
(invariant loss under a group of transformations). A loss function , where denotes the set of the parameter estimates (called decision space), is invariant under a transformation iff for any estimate , there exists a unique such that We denote then by the bijection, from onto , such that . Hence, we have
[TABLE]
We denote by the induced group on the decision space .
Definition 8**.**
Assume that the estimation problem is invariant under the group . Let and be, respectively, the induced groups on the parameter space and the decision space . An estimate is said to be equivariant iff
[TABLE]
We will see, under condition (6), that the empirical minimum divergence estimates, introduced in Broniatowski and Keziou (2012), are equivariant for the above models, using results on the existence and characterization of the distribution ont the sets . First, we recall the definition of -divergences and some of their properties. Let be a convex function from onto with , and such that its domain, is an interval, with endpoints satisfying , which may be bounded or unbounded, open or not. We assume that is closed; the closedness of means that if or are finite then when , and when . Note that, this is equivalent to the fact that the level sets , , are closed in endowed with the usual topology. For any s.f.m. , the -divergence between and a probability distribution , when is absolutely continuous with respect to (a.c.w.r.t) , is defined through
[TABLE]
where is the Radon-Nikodym derivative of w.r.t. . When is not a.c.w.r.t. , we set . For any probability distribution , the mapping is convex and takes nonnegative values. When then . Furthermore, if the function is strictly convex on a neighborhood of , then we have
[TABLE]
All the above properties are presented in Csiszár (1963), Csiszár (1967) and in Chapter 1 of Liese and Vajda (1987), for divergences defined on the set of all probability distributions . When the -divergences are extended to , then the same arguments as developed on hold. When defined on , the Kullback-Leibler , modified Kullback-Leibler , , modified , Hellinger , and divergences are respectively associated to the convex functions , , , , and . All these divergences except the one, belong to the class of the so-called power divergences introduced in Cressie and Read (1984) (see also Liese and Vajda (1987) and Pardo (2006)). They are defined through the class of convex functions
[TABLE]
if , and . So, the divergence is associated to , the to , the to , the to and the Hellinger distance to . We extend the definition of the power divergences functions onto the whole set of signed finite measures as follows. When the function is not defined on or when is defined on but is not convex (for instance if ), we extend the definition of as follows
[TABLE]
Note that for -divergence, the corresponding function is convex and defined on whole . In this paper, for technical considerations, we assume that the functions are strictly convex on their domain , twice continuously differentiable on , the interior of their domain. Hence, , and for all , . Here, and are used to denote respectively the first and the second derivative functions of . Note that the above assumptions on are not restrictive, and that all the power functions , see (12), satisfy the above conditions, including all standard divergences.
2. Minimum empirical divergence estimates
Let denote an i.i.d. sample of a random vector with probability distribution . Let be the associated empirical measure, namely,
[TABLE]
where denotes the Dirac measure at point , for all . For a given , the “plug-in” estimate of is
[TABLE]
If the projection of on exists, then it is clear that is a s.f.m. (or possibly a probability distribution) a.c.w.r.t. ; this means that the support of must be included in the set . So, define the set
[TABLE]
which may be seen as a subset of . Then, the plug-in estimate (13) can be written as
[TABLE]
In the same way,
[TABLE]
can be estimated by
[TABLE]
By uniqueness of and since the infimum is reached in , we estimate through
[TABLE]
The expression of the estimate , given in (15), is the solution of a convex optimization problem under convex constrained subset in . In order to transform this problem to an unconstrained one, we will make use of the Fenchel-Lengendre transform, denoted , of the convex function , as well as some other duality arguments. It is defined by
[TABLE]
For convenience, we recall some properties of the convex conjugate of . For the proofs we can refer to Section 26 in Rockafellar (1970). Theses properties will be used to determine the convex conjugates of some standard divergence functions ; see Table 1 below. The function in turn is convex and closed, its domain is an interval with endpoints
[TABLE]
satisfying with . Note that the interval
[TABLE]
can be different from , the real domain of given by (19). This holds when or is finite and or is finite, respectively. For example, for the convex function
[TABLE]
we have and , and we can see that the domain of the corresponding -function is which is different from The two intervals and coincide if the function is “essentially smooth”, i.e., differentiable with
[TABLE]
The strict convexity of on its domain is equivalent to the condition that its conjugate is essentially smooth, i.e., differentiable with
[TABLE]
Conversely, is essentially smooth on its domain if and only if is strictly convex on its domain .
In all the sequel, we assume additionally that is essentially smooth. Hence, is strictly convex on its domain , and it holds that
[TABLE]
and
[TABLE]
where denotes the inverse of the derivative function of . It holds also that is twice continuously differentiable on with
[TABLE]
In particular, and . Obviously, since is assumed to be closed, we have
[TABLE]
which may be finite or infinite. Hence, by closedness of , likewise we have
[TABLE]
Finally, the first and second derivatives of in and are defined to be the limits of and when and when . The first and second derivatives of in and are defined in a similar way. In Table 1, using the above properties, we give the convex conjugates of some standard divergence functions , associated to standard divergences. We determine also their domains, respectively, and .
Using some duality arguments, see Broniatowski and Keziou (2012), we can show that, for any , if there exists in such that
[TABLE]
then
[TABLE]
with dual attainment. Conversely, if there exists some dual optimal solution
[TABLE]
such that
[TABLE]
then the equality (25) holds, and the unique optimal solution of the primal problem
[TABLE]
namely, the projection of on , is given by
[TABLE]
where is solution of the system of equations
[TABLE]
In view of the last results, using the notations
[TABLE]
and
[TABLE]
we obtain the following equivalent expressions to the estimates , and , see (13), (16) and (17),
[TABLE]
[TABLE]
and
[TABLE]
Remark 9**.**
The empirical likelihood estimate is obtained for the particular choice of the modified Kullback-Leibler divergence , namely, when . Moreover, straightforward computation shows that , . Therefore, can be omitted, and the above expression can be simplified to
[TABLE]
and
[TABLE]
We will show that for any divergence , the estimate is invariant with respect to loss for the additive group, and invariant with respect to loss for the multiplicative group. First, we expose the asymptotic counterpart of the estimates (29), (31) and (33). In particular, we give results about existence and characterization of the projection of on the model . The characterization of the projection will be of great importance in computing the minimum risk equivariant estimate. We have; see Theorem 1 in Broniatowski and Keziou (2006):
Proposition 10**.**
Let be a given value in . Assume that for all , and that there exists in such that and111The strict inequalities mean that
[TABLE]
Then, we have
[TABLE]
with dual attainment. Conversely, if there exists a dual optimal solution
[TABLE]
belonging to the interior (in ) of the set
[TABLE]
then the dual equality (35) holds, and the unique optimal solution of the primal problem , namely, the projection of on , is given by
[TABLE]
where is the solution of the system of equations
[TABLE]
*Furthermore, the solution is unique if the functions are linearly independent in the sense that for all with
Remark 11**.**
By minimizing , upon , , we obtain the semiparametric model of densities
[TABLE]
where is the solution of the system of equations (37). For the particular case of the -divergence, namely, when , can be explicitly computed, and the obtained model is the semiparametric exponential family of probability densities
[TABLE]
where, for all , is the solution in of the system of equations
[TABLE]
or equivalently
[TABLE]
Proposition 12**.**
Assume that condition (6) holds. Then, the minimum empirical -divergence estimates are equivariant
to the additive group of transformations with respect to the loss; 2. -
to the multiplicative group of transformations with respect to the loss.
Moreover, in both cases, the induced group of transformations on the space of estimates is equal to , the group of transformations on the parameter space , in the sense that
[TABLE]
Corollary 13**.**
For any estimate , the corresponding loss function is constant.
In view of the above corollary, for the additive group, in order to obtain the uniform minimum risk estimate, we can compute the risk of any estimate under the particular value , and then select the estimate that minimizes the risk. Likewise, if a multiplicative group is considered, to obtain the uniform minimum risk estimate, we can compute the risk of any estimate under the particular value , and then select the estimate that minimizes the risk. To do this, we will first characterize the equivariant estimates.
Definition 14**.**
A functional is “invariant” iff
[TABLE]
Definition 15**.**
A functional is a “maximal invariant” iff it is invariant and satisfies
[TABLE]
Remark 16**.**
For the additive group, we have that a functional , a function of , is maximal invariant. Likewise, for the multiplicative groupe, a functional , a function of , is maximal invariant.
Proposition 17**.**
Assume that the estimation problem is invariant under the group . Let and be, respectively, the induced groups on the parameter space and the decision space . Let be any equivariant estimate. Then, an estimate is equivariant iff
[TABLE]
for some invariant functional , i.e.,
[TABLE]
Proposition 18**.**
(Hoff (2012), Theorem 3). A functional is invariant iff it is a function of a maximal invariant functional .
Combining the above results, we obtain
Proposition 19**.**
Let be any equivariant estimate. Then is equivariant iff
[TABLE]
where is some function of the maximal invariant functional
Remark 20**.**
Notice that acts additively for additive group, and multiplicatively for multiplicative group, i.e.,
[TABLE]
when an additive group is considered, and
[TABLE]
for multiplicative group.
3. UMRE estimate for additive group
Let be any one of the equivariant estimates , and assume that . Consider the loss. In view of the above statements, the UMRE estimate of is then given by
[TABLE]
where , and is the conditional expectation of given , under the assumption that . We give in the following an asymptotic approximation to the conditional expectation
[TABLE]
using the result of Jurečková and Picek (2009). Straightforward calculs, shows that the score function, of the semiparametric exponential family (38), can be written as
[TABLE]
where and are, respectively, the derivative w.r.t. , of and the solution of the system (39). The derivative can be derived by the implicit function theorem. Denote with
[TABLE]
Let
[TABLE]
Then, by the implicit function theorem, we have
[TABLE]
Notice that, for true value , since , we obtain for the true value the more simpler expression
[TABLE]
Let
[TABLE]
Under some integrability assumptions, by dominated convergence theorem, we obtain
[TABLE]
which is the opposite of the Fisher information matrix.
Theorem 21**.**
Under some regularity conditions, we have
[TABLE]
which gives the following approximation of the UMRE estimate
[TABLE]
where is the empirical estimate of the Fisher information matrix , given by
[TABLE]
with
[TABLE]
* is the solution of the empirical version of the system (39), i.e., the solution in of*
[TABLE]
and is the gradient of at the point given by
[TABLE]
where
[TABLE]
and
[TABLE]
4. UMRE estimate for multiplicative group
Let be any one of the equivariant estimates of , and assume that . Consider the loss. In view of the above statements, the UMRE estimate of is given by
[TABLE]
where , and is the conditional expectation given , under the assumption that .
5. Simulation results
Example 22**.**
Consider the model
[TABLE]
where . Let be a random variable with distribution with . The model is invariant to the additive group. We compare the mean square errors (MSE) of the EL estimate and the proposed UMRE estimate using the approximation (49), for the sample sizes , with runs. We can see, from figure 1, that the proposed estimate improves the EL one for moderate sample sizes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Broniatowski and Keziou (2006) Broniatowski, M. and Keziou, A. (2006). Minimization of ϕ italic-ϕ \phi -divergences on sets of signed measures. Studia Sci. Math. Hungar.; ar Xiv:1003.5457 , 43 (4), 403–442.
- 2Broniatowski and Keziou (2012) Broniatowski, M. and Keziou, A. (2012). Divergences and duality for estimation and test under moment condition models. J. Statist. Plann. Inference , 142 (9), 2554–2573.
- 3Chen and Qin (1993) Chen, J. H. and Qin, J. (1993). Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika , 80 (1), 107–116.
- 4Cressie and Read (1984) Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B , 46 (3), 440–464.
- 5Csiszár (1963) Csiszár, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. , 8 , 85–108.
- 6Csiszár (1967) Csiszár, I. (1967). On topology properties of f 𝑓 f -divergences. Studia Sci. Math. Hungar. , 2 , 329–339.
- 7Godambe and Thompson (1989) Godambe, V. P. and Thompson, M. E. (1989). An extension of quasi-likelihood estimation. J. Statist. Plann. Inference , 22 (2), 137–172. With discussion and a reply by the authors.
- 8Haberman (1984) Haberman, S. J. (1984). Adjustment by minimum discriminant information. Ann. Statist. , 12 (3), 971–988.
