Efficient Semiparametric Inference for Distributed Data with Blockwise Missingness
Jingyue Huang, Huiyuan Wang, Yuqing Lei, Yong Chen

TL;DR
This paper introduces an efficient, communication-friendly semiparametric inference method for distributed data with blockwise missingness, leveraging transfer functions to incorporate external information without compromising internal data efficiency.
Contribution
It proposes a novel augmented one-step estimator that is communication-efficient, do-no-harm, statistically optimal, and scalable for distributed data with blockwise missingness.
Findings
The method achieves semiparametric efficiency bounds.
It requires only one round of summary statistic communication.
Simulation studies confirm efficiency and scalability.
Abstract
We consider statistical inference for a finite-dimensional parameter in a regular semiparametric model under a distributed setting with blockwise missingness, where entire blocks of variables are unavailable at certain sites and sharing individual-level data is not allowed. To improve efficiency of the internal study, we propose a class of augmented one-step estimators that incorporate information from external sites through ``transfer functions.'' The proposed approach has several advantages. First, it is communication-efficient, requiring only one-round communication of summary-level statistics. Second, it satisfies a do-no-harm property in the sense that the augmented estimator is no less efficient than the original one based solely on the internal data. Third, it is statistically optimal, achieving the semiparametric efficiency bound when the transfer function is appropriately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
