Distributed Estimation and Inference with Statistical Guarantees
Heather Battey, Jianqing Fan, Han Liu, Junwei Lu, Ziwei Zhu

TL;DR
This paper develops a divide and conquer framework for hypothesis testing and parameter estimation, providing statistical guarantees and optimal choices of the number of subsamples to maintain efficiency.
Contribution
It introduces new likelihood-based test statistics and estimators for distributed data, with theoretical bounds on the number of subsamples to ensure minimal information loss.
Findings
Theoretical upper bound on the number of subsamples k for negligible information loss
Estimators achieve the same efficiency as full-sample methods
Numerical results validate the theoretical guarantees
Abstract
This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from subsamples of size , where is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose as grows large, providing a theoretical upper bound on such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
