Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions
Dimitris Bertsimas, Bart Van Parys

TL;DR
This paper introduces a new exact and scalable algorithm for high-dimensional sparse regression, revealing phase transition phenomena where problem difficulty decreases as sample size increases, outperforming existing methods in speed and accuracy.
Contribution
The authors develop a binary convex reformulation and cutting plane method that significantly improves the scalability and exactness of sparse regression solutions.
Findings
Solves sparse regression problems with hundreds of thousands of samples and regressors in seconds.
Identifies phase transition phenomena where larger sample sizes simplify the problem.
Demonstrates that the approach outperforms Lasso in speed and statistical relevance.
Abstract
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
