Variable Selection is Hard
Dean Foster, Howard Karloff, and Justin Thaler

TL;DR
This paper proves that, under standard complexity assumptions, no polynomial-time algorithm can reliably perform sparse linear regression with certain parameters, even when a perfect sparse solution exists, highlighting fundamental computational limits.
Contribution
It establishes the first hardness results for sparse regression where the algorithm's sparsity level exceeds the true sparsity and the approximation error is positive.
Findings
No polynomial-time algorithm can find a sparse solution with certain parameters.
Hardness results apply even when a perfect sparse solution exists.
Results extend to noisy data scenarios.
Abstract
Variable selection for sparse linear regression is the problem of finding, given an m x p matrix B and a target vector y, a sparse vector x such that Bx approximately equals y. Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a k'-sparse x with ||Bx-y||^2<=h(m,p), where k'=k*2^{log^{1-delta} p} and h(m,p)<=p^(C_1)*m^(1-C_2), where delta>0, C_1>0,C_2>0 are arbitrary. This is true even under the promise that there is an unknown k-sparse vector x^* satisfying Bx^*=y. We prove a similar result for a statistical version of the problem in which the data are corrupted by noise. To the authors' knowledge, these are the first hardness results for sparse regression that apply when the algorithm simultaneously has k'>k and h(m,p)>0.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Complexity and Algorithms in Graphs
