Hypothesis Testing of One-Sample Mean Vector in Distributed Frameworks

Bin Du; Junlong Zhao

arXiv:2110.02588·stat.ME·October 7, 2021·1 cites

Hypothesis Testing of One-Sample Mean Vector in Distributed Frameworks

Bin Du, Junlong Zhao

PDF

Open Access

TL;DR

This paper develops distributed hypothesis tests for the mean vector in large-scale data settings, balancing communication costs and statistical power, and extends classical tests to distributed frameworks for both low and high dimensions.

Contribution

It introduces novel distributed test statistics for mean vector hypotheses, reducing communication costs while analyzing the power tradeoffs compared to centralized tests.

Findings

01

Distributed tests significantly reduce communication costs.

02

Tradeoff exists between test power and communication efficiency.

03

Numerical results validate theoretical insights.

Abstract

Distributed frameworks are widely used to handle massive data, where sample size $n$ is very large, and data are often stored in $k$ different machines. For a random vector $X \in R^{p}$ with expectation $μ$ , testing the mean vector $H_{0} : μ = μ_{0}$ vs $H_{1} : μ \neq = μ_{0}$ for a given vector $μ_{0}$ is a basic problem in statistics. The centralized test statistics require heavy communication costs, which can be a burden when $p$ or $k$ is large. To reduce the communication cost, distributed test statistics are proposed in this paper for this problem based on the divide and conquer technique, a commonly used approach for distributed statistical inference. Specifically, we extend two commonly used centralized test statistics to the distributed ones, that apply to low and high dimensional cases, respectively. Comparing the power of centralized test statistics and the distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms · Markov Chains and Monte Carlo Methods