Sparse regression and marginal testing using cluster prototypes
Stephen Reid, Robert Tibshirani

TL;DR
This paper introduces a novel method for sparse regression and marginal testing that clusters features, selects prototypes, and uses advanced inference techniques to provide accurate p-values, confidence intervals, and FDR control.
Contribution
It combines feature clustering with post-selection inference and knockoff methods to improve sparse regression and testing with correlated features.
Findings
Provides exact p-values and confidence intervals accounting for selection.
Achieves finite sample FDR control using knockoff techniques.
Demonstrates effectiveness on real and simulated datasets.
Abstract
We propose a new approach for sparse regression and marginal testing, for data with correlated features. Our procedure first clusters the features, and then chooses as the cluster prototype the most informative feature in that cluster. Then we apply either sparse regression (lasso) or marginal significance testing to these prototypes. While this kind of strategy is not entirely new, a key feature of our proposal is its use of the post-selection inference theory of Taylor et al. (2014) and Lee et al. (2014) to compute exact p-values and confidence intervals that properly account for the selection of prototypes. We also apply the recent "knockoff" idea of Barber and Cand\`es to provide exact finite sample control of the FDR of our regression procedure. We illustrate our proposals on both real and simulated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
