Easy Maximum Empirical Likelihood Estimation of Linear Functionals Of A Probability Measure With Infinitely Many Constraints
Shan Wang, Hanxiang Peng

TL;DR
This paper introduces a simple empirical likelihood method for efficiently estimating linear functionals of a probability measure with infinitely many constraints, applicable in various informational settings.
Contribution
It develops an easy empirical likelihood estimator that handles infinitely many constraints and different types of side information, improving estimation efficiency.
Findings
The estimator achieves semiparametric efficiency.
Simulation results show significant efficiency gains.
The method applies to known marginals, unknown identical marginals, and symmetric distributions.
Abstract
In this article, we construct semiparametrically efficient estimators of linear functionals of a probability measure in the presence of side information using an easy empirical likelihood approach. We use estimated constraint functions and allow the number of constraints to grow with the sample size. Considered are three cases of information which can be characterized by infinitely many constraints: (1) the marginal distributions are known, (2) the marginals are unknown but identical, and (3) distributional symmetry. An improved spatial depth function is defined and its asymptotic properties are studied. Simulation results on efficiency gain are reported.
| Cauchy | ||||||
| dim=2 | dim=3 | |||||
| 50 | 0.0641 | 0.0094 | 0.1463 | 0.0830 | 0.0111 | 0.1339 |
| 100 | 0.0330 | 0.0040 | 0.1209 | 0.0379 | 0.0048 | 0.1260 |
| 200 | 0.0157 | 0.0018 | 0.1162 | 0.0207 | 0.0024 | 0.1153 |
| 500 | 0.0060 | 0.0007 | 0.1174 | 0.0078 | 0.0009 | 0.1181 |
| Student (df=3) | ||||||
| dim=2 | dim=3 | |||||
| 50 | 0.0432 | 0.0064 | 0.1477 | 0.0609 | 0.0083 | 0.1363 |
| 100 | 0.0218 | 0.0029 | 0.1322 | 0.0281 | 0.0037 | 0.1329 |
| 200 | 0.0119 | 0.0014 | 0.1161 | 0.0145 | 0.0017 | 0.1198 |
| 500 | 0.0046 | 0.0005 | 0.1096 | 0.0055 | 0.0007 | 0.1257 |
| Copula distribution with marginals & | ||||||
| dim=2 | dim=3 | |||||
| 50 | 0.0523 | 0.0051 | 0.0972 | 0.0542 | 0.0082 | 0.1515 |
| 100 | 0.0262 | 0.0021 | 0.0790 | 0.0278 | 0.0036 | 0.1285 |
| 200 | 0.0131 | 0.0009 | 0.0715 | 0.0135 | 0.0017 | 0.1291 |
| 500 | 0.0054 | 0.0004 | 0.0679 | 0.0055 | 0.0007 | 0.1235 |
| Asymmetric Laplace | ||||||
| dim=2 | dim=3 | |||||
| 50 | 0.0153 | 0.0021 | 0.1360 | 0.0191 | 0.0024 | 0.1248 |
| 100 | 0.0072 | 0.0009 | 0.1213 | 0.0090 | 0.0011 | 0.1209 |
| 200 | 0.0035 | 0.0004 | 0.1141 | 0.0043 | 0.0005 | 0.1155 |
| 500 | 0.0013 | 0.0001 | 0.1139 | 0.0017 | 0.0002 | 0.1072 |
| One marginal known | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 50 | 0.0783 | 0.0499 | 0.6366 | 0.0842 | 0.0527 | 0.6261 | 0.0773 | 0.0525 | 0.6790 |
| 100 | 0.0399 | 0.0250 | 0.6277 | 0.0381 | 0.0228 | 0.5989 | 0.0382 | 0.0237 | 0.6213 |
| 200 | 0.0189 | 0.0119 | 0.6268 | 0.0195 | 0.0116 | 0.5976 | 0.0190 | 0.0116 | 0.6093 |
| 500 | 0.0074 | 0.0046 | 0.6184 | 0.0082 | 0.0045 | 0.5530 | 0.0075 | 0.0045 | 0.6012 |
| One marginal unknown | |||||||||
| 50 | 0.0776 | 0.0518 | 0.6679 | 0.0853 | 0.0533 | 0.6249 | 0.0778 | 0.0532 | 0.6841 |
| 100 | 0.0385 | 0.0234 | 0.6082 | 0.0404 | 0.0243 | 0.6016 | 0.0406 | 0.0241 | 0.5931 |
| 200 | 0.0204 | 0.0117 | 0.5720 | 0.0203 | 0.0122 | 0.6020 | 0.0197 | 0.0109 | 0.5538 |
| 500 | 0.0073 | 0.0045 | 0.6087 | 0.0079 | 0.0049 | 0.6158 | 0.0078 | 0.0044 | 0.5595 |
| One marginal known | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 50 | 0.0581 | 0.0384 | 0.6602 | 0.0576 | 0.0357 | 0.6199 | 0.0602 | 0.0385 | 0.6403 |
| 100 | 0.0292 | 0.0177 | 0.6069 | 0.0262 | 0.0171 | 0.6519 | 0.0274 | 0.0169 | 0.6178 |
| 200 | 0.0149 | 0.0092 | 0.6204 | 0.0142 | 0.0088 | 0.6209 | 0.0136 | 0.0085 | 0.6239 |
| 500 | 0.0055 | 0.0036 | 0.6453 | 0.0057 | 0.0034 | 0.5973 | 0.0056 | 0.0033 | 0.5830 |
| One marginal unknown | |||||||||
| 50 | 0.0554 | 0.0359 | 0.6481 | 0.0561 | 0.0352 | 0.6278 | 0.0575 | 0.0373 | 0.6477 |
| 100 | 0.0291 | 0.0184 | 0.6314 | 0.0288 | 0.0171 | 0.5921 | 0.0286 | 0.0174 | 0.6072 |
| 200 | 0.0150 | 0.0096 | 0.6410 | 0.0136 | 0.0087 | 0.6344 | 0.0141 | 0.0086 | 0.6053 |
| 500 | 0.0056 | 0.0034 | 0.6112 | 0.0057 | 0.0033 | 0.5732 | 0.0057 | 0.0032 | 0.5622 |
| One marginal known | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 50 | 0.0545 | 0.0366 | 0.6713 | 0.0541 | 0.0357 | 0.6601 | 0.0527 | 0.0373 | 0.7066 |
| 100 | 0.0273 | 0.0181 | 0.6626 | 0.0265 | 0.0183 | 0.6921 | 0.0269 | 0.0174 | 0.6470 |
| 200 | 0.0132 | 0.0088 | 0.6665 | 0.0142 | 0.0085 | 0.6005 | 0.0137 | 0.0082 | 0.5995 |
| 500 | 0.0054 | 0.0039 | 0.7113 | 0.0053 | 0.0033 | 0.6208 | 0.0053 | 0.0033 | 0.6162 |
| One marginal unknown | |||||||||
| 50 | 0.0562 | 0.0385 | 0.6841 | 0.0519 | 0.0348 | 0.6698 | 0.0543 | 0.0364 | 0.6707 |
| 100 | 0.0275 | 0.0193 | 0.7021 | 0.0272 | 0.0172 | 0.6321 | 0.0267 | 0.0172 | 0.6430 |
| 200 | 0.0127 | 0.0089 | 0.7012 | 0.0131 | 0.0085 | 0.6495 | 0.0129 | 0.0082 | 0.6324 |
| 500 | 0.0054 | 0.0035 | 0.6427 | 0.0055 | 0.0036 | 0.6451 | 0.0052 | 0.0032 | 0.6187 |
| One marginal known | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 50 | 0.0189 | 0.0113 | 0.6006 | 0.0190 | 0.0114 | 0.6004 | 0.0181 | 0.0115 | 0.6335 |
| 100 | 0.0090 | 0.0053 | 0.5845 | 0.0093 | 0.0051 | 0.5464 | 0.0089 | 0.0051 | 0.5760 |
| 200 | 0.0044 | 0.0026 | 0.5962 | 0.0042 | 0.0023 | 0.5547 | 0.0042 | 0.0023 | 0.5549 |
| 500 | 0.0016 | 0.0009 | 0.5867 | 0.0016 | 0.0009 | 0.5627 | 0.0016 | 0.0009 | 0.5525 |
| One marginal unknown | |||||||||
| 50 | 0.0178 | 0.0118 | 0.6656 | 0.0196 | 0.0118 | 0.6023 | 0.0186 | 0.0130 | 0.6966 |
| 100 | 0.0085 | 0.0056 | 0.6565 | 0.0087 | 0.0055 | 0.6346 | 0.0083 | 0.0051 | 0.6162 |
| 200 | 0.0042 | 0.0027 | 0.6504 | 0.0043 | 0.0025 | 0.5757 | 0.0044 | 0.0025 | 0.5792 |
| 500 | 0.0017 | 0.0011 | 0.6342 | 0.0017 | 0.0010 | 0.5748 | 0.0017 | 0.0010 | 0.5539 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
Easy Maximum Empirical Likelihood Estimation of Linear Functionals
Of A Probability Measure With Infinitely Many Constraints
Shan Wang label=e1][email protected] [
Hanxiang Penglabel=e2][email protected] [ University of San Francisco
Department of Mathematics and Statistics
San Francisco, CA 94117, USA
Indiana University Purdue University Indianapolis
Department of Mathematical Sciences
Indianapolis, IN 46202-3267, USA
Abstract
In this article, we construct semiparametrically efficient estimators of linear functionals of a probability measure in the presence of side information using an easy empirical likelihood approach. We use estimated constraint functions and allow the number of constraints to grow with the sample size. Considered are three cases of information which can be characterized by infinitely many constraints: (1) the marginal distributions are known, (2) the marginals are unknown but identical, and (3) distributional symmetry. An improved spatial depth function is defined and its asymptotic properties are studied. Simulation results on efficiency gain are reported.
Empirical likelihood; Infinitely many constraints; Maximum empirical likelihood estimator; Semiparametric efficiency; Spatial median,
62G05; ,
62G20, 62H11,
keywords:
keywords:
[class=AMS]
\startlocaldefs
and th1Corresponding author
1 Introduction
Suppose that are independent and identically distributed (i.i.d.) random variables with a common distribution taking values in a measurable space . In this article, we are interested in efficient estimation of the linear functional \mbox{\boldmath\theta\unboldmath}=\int\mbox{\boldmath\psi\unboldmath}\,dQ of for some square-integrable function from to when side information is available through a vector function (constraint) which satisfies
- (C)
is measurable from to such that and the variance-covariance matrix is nonsingular.
The commonly used sample mean \bar{}\mbox{\boldmath\psi\unboldmath}=\frac{1}{n}\sum_{j=1}^{n}\mbox{\boldmath\psi\unboldmath}(Z_{j}) of \mbox{\boldmath\theta\unboldmath}=E(\mbox{\boldmath\psi\unboldmath}(Z)) does not use the information, and is not efficient in the sense of least dispersed regular estimators, see e.g. Bickel, Klaassen, Ritov and Wellner (1993). Based on the criterion of maximum empirical likelihood, an improved estimator which utilizes the information is
[TABLE]
where is the solution to the equation
[TABLE]
We shall refer to \tilde{}\mbox{\boldmath\theta\unboldmath} as the EL-weighted estimator.
There is an extensive amount of literature on the empirical likelihood testing of hypothesis, see e.g. Owen (1988, 2001). Soon it was used to construct point estimators. Qin and Lawless (1994) studied maximum empirical likelihood estimators (MELE) and showed in Corollary 2 that MELE are fully efficient. As a special case of MELE, estimators of the preceding easy form were studied in Zhang (1995, 1997) in M-estimation and quantile processes in the presence of auxiliary information (side information). For a fixed number of known constraint functions, the asymptotic normality (ASN) and efficiency of MELE were established.
Hjort, McKeague and Van Keilegom (2009) extended the scope of the empirical likelihood testing hypothesis, and developed a general theory for constraints with nuisance parameters and considered the case with infinitely many constraints. Peng and Schick (2013) generalized the empirical likelihood testing to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. Peng and Tan (2018) expanded the results of the latter to U-statistics based general estimating equations with side information.
Parente and Smith (2011) studied generalized empirical likelihood estimators for irregular constraints. Peng and Schick (2018) presented a theory of maximum empirical likelihood estimation and empirical likelihood ratio testing with irregular and estimated constraint functions. Wang and Peng (2022) used the easy EL-weighted approach to construct improved estimators of linear functionals of a probability measure when side information is available. Motivated by nuisance parameters common in semiparametric models and the infinite dimension of such models, they studied the use of estimated functions for growing number of constraints with the sample size. They applied the results to improve estimation efficiency in the structural equation models.
We shall rely the results of Wang and Peng (2022) to construct efficient estimators of linear functionals of a probability measure for a few cases of side information which is determined by infinitely many constraints. Bickel, Ritov and Wellner (1991) characterized efficient estimation of for known when the marginal distributions of and of are known, and construct an efficient estimator based on the criterion of minimum chisquare-type objective function. Peng and Schick (2005) calculated the information lower bound when the marginal distributions are unknown but identical, and constructed an efficient estimator based on the criterion of least squares objective. Peng and Schick (2018) constructed empirical likelihood tests of stochastic independence and distributional symmetry. Each of independence, symmetry, known or equal marginal distributions is equivalent to infinitely many equations (constraints), and can be used to improve estimation efficiency. Here we construct the EL-weighted estimators and demonstrate the semiparametric efficiency. Note the simple analytic form of our estimators, and the property of easy incorporation of side information to improve efficiency.
The efficiency criteria used are that of a least dispersed regular estimator or that of a locally asymptotic minimax estimator, and are based on the convolution theorems and on the lower bounds of the local asymptotic risk in LAN and LAMN families, see the monograph by Bickel, et al. (1993) among others.
In what follows, we will summarize some results from Wang and Peng (2022) for the convenience of our use. Meanwhile, we provide the proof of the semiparametric effiency. In many semiparametric models, the constraint vector function is usually unknown and must be estimated by some measurable function . Using it, we now work with the EL-weights,
[TABLE]
where solves Eqt (1.2) with . A natural estimate \hat{}\mbox{\boldmath\theta\unboldmath} of now is
[TABLE]
We now allow the number of constraints to depend on the sample size , , and tend to infinity slowly with . To stress the dependence, write
[TABLE]
and \mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}_{n}=\mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}, \mbox{\boldmath\hat{}\mbox{\boldmath\unboldmath}\unboldmath}_{n}=\mbox{\boldmath\hat{}\mbox{\boldmath\unboldmath}\unboldmath} for the corresponding estimators of , that is,
[TABLE]
where \tilde{}\mbox{\boldmath\zeta\unboldmath}_{n} and \hat{}\mbox{\boldmath\zeta\unboldmath}_{n} solve Eqt (1.2) with and , respectively,.
The ASN of \mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}_{n} and \mbox{\boldmath\hat{}\mbox{\boldmath\unboldmath}\unboldmath}_{n} are, respectively, given in Theorems 3 and 4 of Wang and Peng (2022), and we now prove the semiparametric efficiency of \mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}_{n} and quote Theorem 4 in the Appendix for convenience of our use. For , write the euclidean norm. For , write the kronecker product. Let , and let . For , write the sample average of , and the closed linear span of the components in . Let be an i.i.d. copy of . Denote by the closed linear span of in . Set
[TABLE]
Following Peng and Schick (2013), a sequence of dispersion matrices is said to be regular if
[TABLE]
Theorem 1.1**.**
Suppose that satisfies (C) for each such that
[TABLE]
the sequence of dispersion matrices is regular and satisfies
[TABLE]
[TABLE]
Then \mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}_{n} is semiparametrically efficient as . Moreover,
[TABLE]
where \varSigma_{0}=\mathop{\rm Var}\nolimits(\mbox{\boldmath\psi\unboldmath}(Z))-\mathop{\rm Var}\nolimits(\mbox{\boldmath\varphi\unboldmath}_{0}(Z)) with \mbox{\boldmath\varphi\unboldmath}_{0}=\Pi(\mbox{\boldmath\psi\unboldmath}|[{\mathbf{u}}_{\infty}]).
Proof. We only need to show the efficiency. It suffices to prove that the orthonormal complement in is the tangent space. To this end, let with be a regular parametric submodel with the score function . By (C),
[TABLE]
Differentiating both sides of the equality with respect to at yields
[TABLE]
This shows . For any bounded , consider for sufficient small . It is clear that is a density and the submodel with the density has the score function which satisfies . Since bounded functions in are dense, it follows that the above conclusion holds for any . This shows is the tangent space.
The article is organized as follows. In Section 2, the EL-weighted spatial depth function is constructed, and its ASN and efficiency are established in the presence of distributional symmetry. The ASN and efficiency of the EL-weighted estimators of linear functionals are proved when the marginal distribution functions are known in Section 3, and when the marginal distributions are unknown but equal in Section 4. The simulation results are reported in Section 5. Section Appendix contains Theorem 4 of Wang and Peng (2022).
2 The EL-weighted spatial median
In this section, we introduce the EL-weighted spatial depth function, exhibit efficiency and give the asymptotic normality.
The statistical depth functions provide a center-outward ordering of a point in with respect to a distribution. High depth values correspond to centrality while low values to “outlyingness”. Depth functions possess robustness property, and can be used to define multivariate medians, which are robust location estimators. Common depth functions include the Tukey depth (halfspace depth), the simplicial depth, the projection depth, and the spatial depth. Here we shall use the easy EL-approach to constructing improved depths, and illustrate it with the spatial depth. The (population) spatial depth function with respect to a distribution is defined as
[TABLE]
where if () is the spatial sign function and has the distribution function (DF) , denoted by . The depth function can be estimated by the sample depth function given by
[TABLE]
where . The sample spatial median is defined as the value which maximizes the depth function, that is,
[TABLE]
Suppose that there is available additional information that can be expressed by a constraint function . While the sample depth does not utilize the information, the EL-weighted depth function makes use of it and is defined by
[TABLE]
where is the solution to the equation
[TABLE]
The EL-weighted spatial median is defined as the value which maximizes the EL-weighted depth function, that is,
[TABLE]
The EL-weighted estimator of \mbox{\boldmath\theta\unboldmath}({\mathbf{x}})=E(\mathbb{S}_{\mathbf{x}}({\mathbf{X}})) is given by
[TABLE]
Remark 2.1**.**
The sample spatial is robust with the breakdown point . The EL-weighted improves efficiency but reduces robustness resulted from the zero value of the EL-weights. One can robustify by truncating the EL-weights from below by a fixed constant. Truncation is commonly used in the inverse probability weighing method. Obviously, truncation leads to certain loss of efficiency.
Known marginal medians. In our simulation study, we looked at the side information that the bivariate random vector has known marginal medians and . That is, the componentwise median is known. In this case, . We are motivated as follows. It is well known that the spatial median is a better location estimator than the componentwise median because the former takes into account the correlation of the components while the latter ignores it, see Chen, Dang, Peng and Bart (2009). We are interested in how much information is lost when the componentwise median is used by looking at how much efficiency of the EL-weighted spatial median (when the marginal medians are known) gains over the sample spatial median (when the marginal medians are unknown).
Growing number of constraints. Suppose that there exists some constant vector such that is symmetric about some known value . Let . Then ’s are i.i.d. random variables which are symmetric about zero. Let be an i.i.d. copy of ’s, and let be the distribution function of . Let be the subspace of consisting of the odd functions. Symmetry of about [math] implies
[TABLE]
Let be the orthonormal trigonometric basis. Define . Then is an odd function in , and form a basis of the space.
In this case, the constraints are , where we allow to grow to infinity slowly with . The EL-weighted depth function is calculated by (2.1) with and \mbox{\boldmath\tilde{\zeta}\unboldmath}=\mbox{\boldmath\tilde{\zeta}\unboldmath}_{n} which solves Eqt (2.2) with . The EL-weighted estimator of \mbox{\boldmath\theta\unboldmath}({\mathbf{x}})=E(\mathbb{S}_{\mathbf{x}}({\mathbf{X}})) then is
[TABLE]
Theorem 2.1**.**
Suppose that is continuous. Then for arbitrary but fixed , as such that , \mbox{\boldmath\tilde{}\mbox{\boldmath\unboldmath}\unboldmath}_{n}({\mathbf{x}}) in (2.5) satisfies
[TABLE]
where \mbox{\boldmath\varphi\unboldmath}_{{\mathbf{x}}0}=\Pi(\mathbb{S}_{\mathbf{x}}({\mathbf{X}})|L_{2,0}(F,\mathrm{odd})) is the projection of onto . As a consequence, if \varSigma_{0}({\mathbf{x}})=\mathop{\rm Var}\nolimits(\mathbb{S}_{\mathbf{x}}({\mathbf{X}}))-\mathop{\rm Var}\nolimits(\mbox{\boldmath\varphi\unboldmath}_{{\mathbf{x}}0}({\mathbf{X}})) is nonsingular,
[TABLE]
Proof of Theorem 2.1. We shall apply Theorem 1.1 to prove the result. Since is the identity matrix, it follows that (C) holds and is regular. As for each and , (1.6) is satisfied, while (1.7) holds in view of the inequalities
[TABLE]
Let be the left hand side of (1.8). Then (1.8) follows from
[TABLE]
We now apply Theorem 1.1 to complete the proof.
Efficiency gain and ASN for . By the properties of empirical likelihood, one concludes that is a valid depth function at least for large as all 1+{\mathbf{u}}({\mathbf{X}}_{i})^{\top}\mbox{\boldmath\tilde{\zeta}\unboldmath}>0. Fix , let be the projection of onto the closed linear span . Then . Clearly,
[TABLE]
Let \mathbb{S}_{2}({\mathbf{x}})=\mathbb{S}\big{(}E(\mathbb{S}_{\mathbf{x}}({\mathbf{X}}))\big{)}. If is nonsingular, then by Theorem 2.1 for fixed ,
[TABLE]
Note that the sample depth satisfies
[TABLE]
where . Thus the reduction of the asymptotic variance-covariance of the EL-weighted depth is
[TABLE]
We now use the Delta method to drive the ASN of the EL-weighted spatial median . To this end, we need some results from Chaudhuri (1992) in the case of for which the spatial median corresponds to his multivariate Hodges-Lehmann type location estimate. The following is his Assumption 3.1.
- (PC)
, , are i.i.d random vectors in with an absolutely continuous (with respect to the Lebesgue measure) distribution having a density that is bounded on every bounded subset of .
Assume (PC) and . Let if and . Note that and are the first and second order partial derivatives of . Under (PC), the underlying distribution is absolutely continuous with respect to the Lebesgue measure on , hence the (population) spatial median uniquely exists and satisfies the equation . The spatial median satisfies
[TABLE]
Let {\mathbf{J}}=E\big{(}(\mathbb{S}\mathbb{S}^{\top})({\mathbf{m}}_{0}-{\mathbf{X}})\big{)} and {\mathbf{K}}=E\big{(}{\mathbf{H}}(\mathbf{m}_{0}-{\mathbf{X}})\big{)}. Chaudhuri (1992) showed in his Theorem 3.3 and its corollary that if (PC) holds then the matrices and are positive definite and satisfies
[TABLE]
Note that the EL-weighted spatial median satisfies the equation,
[TABLE]
Using the Delta method, we derive, with ,
[TABLE]
where is calculated by (2.6).
Growing number of estimated constraints. For unknown , we estimate it by the symmetrized empirical distribution function,
[TABLE]
Let . We thus obtain computable functions . Write for , and estimate it by . The EL-weighted estimator of \mbox{\boldmath\theta\unboldmath}({\mathbf{x}})=E(\mathbb{S}_{\mathbf{x}}(X)) is now given by
[TABLE]
where \mbox{\boldmath\hat{\zeta}\unboldmath}_{n} solves Eqt (2.2) with . We have
Theorem 2.2**.**
Suppose that is continuous. Then \mbox{\boldmath\hat{}\mbox{\boldmath\unboldmath}\unboldmath}_{n} defined in (2.7) satisfies the conclusions of Theorem 2.1 as such that .
Proof of Theorem 2.2. We shall use Theorem 6.1 of Wang and Peng (2022) for the proof. First, (C) is satisfied with regular as . Next, (6.1) follows from and . Let
[TABLE]
Then follows from and
[TABLE]
Let . It is easy to see
[TABLE]
Thus (6.2) follows from to be shown next. To this end, let . Then . One verifies \|\mbox{\boldmath\psi\unboldmath}_{n}^{\prime}(t)\|\leq am_{n}^{3/2} for some constant . Therefore, follows from and , in view of
[TABLE]
Denoting \mbox{\boldmath\psi\unboldmath}({\mathbf{y}})=\mathbb{S}_{\mathbf{x}}({\mathbf{y}}), we break
[TABLE]
[TABLE]
By Cauchy inequality,
[TABLE]
as . We now bound the variance by the second moment to get
[TABLE]
as . Taken together we prove (6.3) – (6.4).
We now show that (6.5) holds with . To this end, using Taylor expansion we write
[TABLE]
[TABLE]
where lies in between and . It thus follows
[TABLE]
as . This shows . One has \|\mbox{\boldmath\psi\unboldmath}^{\prime\prime}(t)\|=O_{p}(m_{n}^{5/2}). Using this, we get
[TABLE]
as . This yields . Taken together the desired (6.5) follows. We now apply Theorem 6.1 to finish the proof.
3 Efficient estimation of linear functionals with known marginals
Suppose that there is available the information that the marginal distributions and of are known. This can be characterized by
[TABLE]
Bickel, et al. (1991) and Peng and Schick (2002) constructed efficient estimators of the linear functional , and proved the ASN under the assumption,
- (K)
There exists such that for arbitrary measurable sets and ,
[TABLE]
Bickel, et al. (1991) showed that the project of onto the sum space uniquely exists. They demonstrated that the asymptotic variance of the efficient estimator of can be substantially less than that of the empirical estimator . For example, they showed that the empirical DF of (taking ) has three times the asymptotic variance of the efficient estimator of in the case that and are uniform distributions over and are independent
Here we propose an efficient estimator based on maximum empirical likelihood. Employing a basis of and of , we can reduce the uncountably many characterizing equations to countably many ones,
[TABLE]
Suppose that and are continuous. This allows us to take and , where are the trigonometric basis,
[TABLE]
That is, and are bases of and , respectively. Using the first terms as constraints, the EL-weighted estimator of is
[TABLE]
where with . Using Theorem 1.1, we prove
Theorem 3.1**.**
Suppose that and are continuous. Assume (K). Then, as such that ,
[TABLE]
where is the projection of onto the sum space . Hence,
[TABLE]
where .
Remark 3.1**.**
By Bickel, et al. (1991) (pp. 1328–29), the estimator in (3.3) of is semiparametrically efficient.
Proof of Theorem 3.1. We shall rely on Theorem 1.1. Since and , it follows that (1.6) holds. Thus
[TABLE]
as . This shows (1.7). Let
[TABLE]
It follows from that (1.8) holds in view of
[TABLE]
We are now left to prove the regularity of . Since are the first terms of the orthonormal basis , it follows that . The same holds for . Let . Then is the dispersion matrix whose (1,1)- and (2, 2)-blocks are equal to and the (1,2)-block equal to . For with , set . We have
[TABLE]
By Cauchy inequality,
[TABLE]
It thus follows from (3.5) that uniformly in and the above . For and , (K) implies
[TABLE]
Thus
[TABLE]
Replacing with yields
[TABLE]
Taking and and noticing
[TABLE]
we derive
[TABLE]
By (3.5), we thus arrive at
[TABLE]
Taken together we prove the regularity of , and apply Theorem 1.1 to complete the proof.
4 Efficient estimation of linear functionals with equal marginals
Suppose that the marginal distributions and of and are equal but unknown. This is equivalent to the assertion that
[TABLE]
where is an orthonormal basis of with . Assume that and are continuous. This allows us take under the assumption , where is the trigonometric basis in (3.2). As is unknown, we estimate it by the pooled empirical distribution function,
[TABLE]
This gives us computable functions . Let . This is unknown and can be estimated by . Using the first terms as constraints, the EL-weighted estimator of is given by
[TABLE]
where \mbox{\boldmath\hat{\zeta}\unboldmath}_{n} is the solution to Eqt (1.2) with .
Peng and Schick (2005) constructed efficient estimators of linear functionals of a bivariate distribution with equal marginals under the condition,
[TABLE]
where is the unit sphere in . They exhibited that the asymptotic variance of an efficient estimator of is about 1/3 of that of the empirical estimator or smaller.
Applying Theorem 6.1, we show that is efficient.
Theorem 4.1**.**
Suppose that the distribution functions and are equal and continuous. Assume (4.3). Then, as such that , given in (4.2) satisfies
[TABLE]
where is the projection of onto . Thus
[TABLE]
where .
Remark 4.1**.**
By Theorem 3 of Peng and Schick (2005), the estimator given in (4.2) of is semiparametrically efficient.
Proof of Theorem 4.1. We shall apply Theorem 6.1. Recalling the trigonometric basis in (3.2), one readily verifies that has the properties,
[TABLE]
where and denote the first and second order derivatives of .
Recalling and , one gets by the first inequality in (4.4) that
[TABLE]
Hence (6.1) holds as . Noting , one has by (4.3) that
[TABLE]
uniformly in and \|\mbox{\boldmath\lambda\unboldmath}\|=1 as both \mbox{\boldmath\lambda\unboldmath}^{\top}{\mathbf{b}}_{n}(H(X)) and \mbox{\boldmath\lambda\unboldmath}^{\top}{\mathbf{b}}_{n}(H(Y)) live in . Moreover,
[TABLE]
Thus is regular. Let
[TABLE]
Then by the first equality in (4.5),
[TABLE]
Hence as . It can be seen
[TABLE]
where . Thus (6.2) is implied by
[TABLE]
Using the second inequality in (4.4), we derive
[TABLE]
Hence and (4.6) holds as . We break
[TABLE]
where
[TABLE]
By Cauchy inequality,
[TABLE]
where the last equality holds as . We now bound the variance by the second moment and by the first equality in (4.5) to get
[TABLE]
as . Taken together (6.3) follows. We now show (6.5) holds with . Using Taylor’s expansion, we write
[TABLE]
where
[TABLE]
where lies in between and . Using the second inequality in (4.4), we get
[TABLE]
as . This shows . Using the third inequality in (4.4), one has as that
[TABLE]
This yields . Taken together one proves (6.5). This and (4.6) imply (6.4) as . Clearly, satisfies . Peng and Schick (2005) showed that the projection of any onto uniquely exists under the assumption (4.3). Moreover, it is clear that is a basis of , so that . We now apply Theorem 6.1 to complete the proof.
5 Simulations
We ran a simulation study to compare the efficiency of the EL-weighted spatial median with the sample spatial median in the presence of a variety of side information. Reported on Tables 1–5 are the maximum eigenvalues of the asymptotic variance-covariance matrices and their ratios. Random samples were generated from 2- and 3- dimensional Cauchy distributions, Student with 3 degrees of freedom (df), the copula distributions (see the details in the Appendix) and the asymmetric Laplace for sample sizes . Based on repetitions , we calculated the averages of the maximum eigenvalues and (i.e. the spectral norms) of the asymptotic variance-covariance matrices of and , and the ratio . A ratio less than one indicates a reduction in the norm of the variance-covariance matrix of the EL-weighted spatial median from that of the sample spatial median.
For Table 1, the side information is that the componentwise medians are known. For Tables 2–5, the information is that one marginal is symmetric about the origin ( constraints considered), for which we looked at both known and unknown marginal (estimated by the symmetrizied EDF).
Observe that for the case of known componentwise medians, the efficiency gain of the EL-weighted spatial median over the sample spatial median exceeded 80%; for the case of known or estimated symmetric marginal, the efficiency gain is more than 30%. All the ratios considered are substantially smaller than one, indicating substantial efficiency gains of the EL-weighted spatial depth over the sample depth. The simulation results indicated that the componentwise median is less efficient than the spatial median but not that much for the case considered.
6 Declaration of interest statement
The authors report there are no competing interests to declare.
Appendix
The details of the coupla distributions. The 2-dimensional copula distribution has and marginals with correlation coefficient . The 3-dimensional copula has two marginals with correlation coefficient and one marginal which is correlated with each of with correlation . The copula has the joint cumulative distribution function with the uniform marginals, where each uniform marginal is defined by applying the probability integral transform on the cumulative distribution functions of two and one , respectively.
We cite Theorem 4 of Wang and Peng (2022) below for convenience.
Theorem 6.1**.**
Suppose satisfies (C) for each . Let be an estimator of such that
[TABLE]
[TABLE]
for which the dispersion matrices is regular,
[TABLE]
there exists some measurable function from into such that (C) is met for every , the dispersion matrix satisfies ,
[TABLE]
[TABLE]
Then \hat{}\mbox{\boldmath\theta\unboldmath} satisfies, as tends to infinity, the stochastic expansion,
[TABLE]
where \mbox{\boldmath\varphi\unboldmath}=\Pi(\mbox{\boldmath\psi\unboldmath}|[{\mathbf{v}}_{\infty}]) is the projection of onto the closed linear span . Thus
[TABLE]
where \varSigma=\mathop{\rm Var}\nolimits(\mbox{\boldmath\psi\unboldmath}(Z))-\mathop{\rm Var}\nolimits(\mbox{\boldmath\varphi\unboldmath}(Z)).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bickel, P. J. , Ritov, Y. and Wellner, J.A. (1991). Efficient estimation of linear functionals of a probability measure P 𝑃 P with known marginal distributions. Ann. Statist. 19 : 1316–1346.
- 2[2] Bickel, P.J. , Klaassen, C.A.J. , Ritov, Y. and Wellner, J.A. (1993). Efficient and Adaptive Estimation in Semiparametric Models. Johns Hopkins Univ. Press, Baltimore .
- 3[3] Chaudhuri, P. (1992). Multivariate location estimation using extension of R-estimates through U-statistics type approach. Ann. Statist. 20 : 897 – 916.
- 4[4] Chen Y., Dang, X., Peng, H. and Bart, Jr., H.L. (2009). Outlier Detection with the Kernelized Spatial Depth Function. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 : 288 – 305.
- 5[5] Hjort, N.L. , Mc Keague, I.W. and Van Keilegom, I. (2009). Extending the scope of empirical likelihood. Ann. Statist. 37 : 1079–1111.
- 6[6] Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 : 237–249.
- 7[7] Owen, A. (2001). Empirical Likelihood . Chapman & Hall/CRC, London.
- 8[8] Parente, P. M. D. C. and Smith, R. J. (2011). GEL methods for nonsmooth moment indicators. Econometric Theory 27 : 74–113. MR 2771012
