Selective Inference with Distributed Data
Sifan Liu, Snigdha Panigrahi

TL;DR
This paper introduces a new distributed selective inference method for sparse regression that efficiently combines low-dimensional summaries from multiple machines to perform valid hypothesis testing with minimal communication overhead.
Contribution
It proposes a novel procedure for conducting approximate selective inference in distributed data settings, leveraging low-dimensional summaries for higher power and low communication costs.
Findings
Achieves higher statistical power compared to existing methods.
Maintains validity of inference with minimal communication.
Applicable to repeated model selection scenarios.
Abstract
As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure for conducting selective inference with distributed data. Although many distributed procedures exist for point estimation in the sparse setting, few options are available for estimating uncertainties or conducting hypothesis tests based on the estimated sparsity. We solve a generalized linear regression on each machine, which then communicates a selected set of predictors to the central machine. The central machine uses these selected predictors to form a generalized linear model (GLM). To conduct inference in the selected GLM, our proposed procedure bases approximately-valid selective inference on an asymptotic likelihood. The proposal seeks only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
