Minimax and Communication-Efficient Distributed Best Subset Selection   with Oracle Property

Jingguo Lan; Hongmei Lin; Xueqin Wang

arXiv:2408.17276·stat.ML·September 2, 2024

Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property

Jingguo Lan, Hongmei Lin, Xueqin Wang

PDF

Open Access

TL;DR

This paper introduces a novel distributed best subset selection algorithm that achieves true sparsity, maintains the oracle property, and significantly reduces communication costs in high-dimensional data analysis.

Contribution

It proposes a two-stage distributed algorithm with a new splicing technique and GIC for adaptive parameter selection, improving sparsity recovery and efficiency.

Findings

01

Correctly identifies true sparsity pattern

02

Achieves minimax $\, ext{ell}_2$ error bound

03

Reduces communication costs significantly

Abstract

The explosion of large-scale data in fields such as finance, e-commerce, and social media has outstripped the processing capabilities of single-machine systems, driving the need for distributed statistical inference methods. Traditional approaches to distributed inference often struggle with achieving true sparsity in high-dimensional datasets and involve high computational costs. We propose a novel, two-stage, distributed best subset selection algorithm to address these issues. Our approach starts by efficiently estimating the active set while adhering to the $ℓ_{0}$ norm-constrained surrogate likelihood function, effectively reducing dimensionality and isolating key variables. A refined estimation within the active set follows, ensuring sparse estimates and matching the minimax $ℓ_{2}$ error bound. We introduce a new splicing technique for adaptive parameter selection to tackle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Advanced Algebra and Logic · Machine Learning and Algorithms

MethodsSparse Evolutionary Training