Distributed Estimation and Inference with Statistical Guarantees

Heather Battey; Jianqing Fan; Han Liu; Junwei Lu; Ziwei Zhu

arXiv:1509.05457·math.ST·September 21, 2015·61 cites

Distributed Estimation and Inference with Statistical Guarantees

Heather Battey, Jianqing Fan, Han Liu, Junwei Lu, Ziwei Zhu

PDF

Open Access

TL;DR

This paper develops a divide and conquer framework for hypothesis testing and parameter estimation, providing statistical guarantees and optimal choices of the number of subsamples to maintain efficiency.

Contribution

It introduces new likelihood-based test statistics and estimators for distributed data, with theoretical bounds on the number of subsamples to ensure minimal information loss.

Findings

01

Theoretical upper bound on the number of subsamples k for negligible information loss

02

Estimators achieve the same efficiency as full-sample methods

03

Numerical results validate the theoretical guarantees

Abstract

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n / k$ , where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference