Efficient estimation with incomplete data via generalised ANOVA decompositions
Thomas B. Berrett

TL;DR
This paper introduces a new estimation method for incomplete multivariate data that achieves near-optimal accuracy by linking generalized ANOVA decompositions with semiparametric efficiency, using iterated nonparametric regression.
Contribution
It characterizes the minimal mean squared error for estimating linear functionals with incomplete data and develops an estimator that nearly attains this bound, extending to biased sampling and non-linear functionals.
Findings
The estimator's risk closely matches the theoretical lower bound.
The efficient variance is characterized by a quadratic optimization problem.
The proposed method provides asymptotically valid confidence intervals.
Abstract
We study the semiparametric efficient estimation of a class of linear functionals in settings where a complete multivariate dataset is supplemented by additional datasets recording subsets of the variables of interest. These datasets are allowed to have a general, in particular non-monotonic, structure. Our main contribution is to characterise the asymptotic minimal mean squared error for these problems and to introduce an estimator whose risk approximately matches this lower bound. We show that the efficient rescaled variance can be expressed as the minimal value of a quadratic optimisation problem over a function space, thus establishing a fundamental link between these estimation problems and the theory of generalised ANOVA decompositions. Our estimation procedure uses iterated nonparametric regression to mimic an approximate influence function derived through gradient descent. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Advanced Statistical Methods and Models · Fault Detection and Control Systems
