Selective Inference with Distributed Data

Sifan Liu; Snigdha Panigrahi

arXiv:2301.06162·stat.ME·March 14, 2023·J. Mach. Learn. Res.·1 cites

Selective Inference with Distributed Data

Sifan Liu, Snigdha Panigrahi

PDF

Open Access

TL;DR

This paper introduces a new distributed selective inference method for sparse regression that efficiently combines low-dimensional summaries from multiple machines to perform valid hypothesis testing with minimal communication overhead.

Contribution

It proposes a novel procedure for conducting approximate selective inference in distributed data settings, leveraging low-dimensional summaries for higher power and low communication costs.

Findings

01

Achieves higher statistical power compared to existing methods.

02

Maintains validity of inference with minimal communication.

03

Applicable to repeated model selection scenarios.

Abstract

As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure for conducting selective inference with distributed data. Although many distributed procedures exist for point estimation in the sparse setting, few options are available for estimating uncertainties or conducting hypothesis tests based on the estimated sparsity. We solve a generalized linear regression on each machine, which then communicates a selected set of predictors to the central machine. The central machine uses these selected predictors to form a generalized linear model (GLM). To conduct inference in the selected GLM, our proposed procedure bases approximately-valid selective inference on an asymptotic likelihood. The proposal seeks only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques