Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property
Jingguo Lan, Hongmei Lin, Xueqin Wang

TL;DR
This paper introduces a novel distributed best subset selection algorithm that achieves true sparsity, maintains the oracle property, and significantly reduces communication costs in high-dimensional data analysis.
Contribution
It proposes a two-stage distributed algorithm with a new splicing technique and GIC for adaptive parameter selection, improving sparsity recovery and efficiency.
Findings
Correctly identifies true sparsity pattern
Achieves minimax $\, ext{ell}_2$ error bound
Reduces communication costs significantly
Abstract
The explosion of large-scale data in fields such as finance, e-commerce, and social media has outstripped the processing capabilities of single-machine systems, driving the need for distributed statistical inference methods. Traditional approaches to distributed inference often struggle with achieving true sparsity in high-dimensional datasets and involve high computational costs. We propose a novel, two-stage, distributed best subset selection algorithm to address these issues. Our approach starts by efficiently estimating the active set while adhering to the norm-constrained surrogate likelihood function, effectively reducing dimensionality and isolating key variables. A refined estimation within the active set follows, ensuring sparse estimates and matching the minimax error bound. We introduce a new splicing technique for adaptive parameter selection to tackle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Advanced Algebra and Logic · Machine Learning and Algorithms
MethodsSparse Evolutionary Training
