Selective Sequential Model Selection
William Fithian, Jonathan Taylor, Robert Tibshirani, and Ryan, Tibshirani

TL;DR
This paper develops a framework for selecting the simplest valid model from a sequence of models generated by data-driven procedures, using new p-values that improve power and extend to various model types.
Contribution
It introduces new p-value constructions for model selection paths that enhance power and extend applicability beyond linear regression to broader parametric and nonparametric models.
Findings
Proposed max-t and next-entry tests improve power over previous methods.
Framework controls FDR and familywise error rate using sequential stopping rules.
Derived conditions ensure independence of p-values for valid error rate control.
Abstract
Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two specific tests, the max-t test for forward stepwise regression (generalizing a proposal of Buja and Brown (2014)), and the next-entry test for the lasso. These tests improve on the power of the saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In addition, our framework extends beyond linear regression to a much more general class of parametric and nonparametric model selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Imbalanced Data Classification Techniques
MethodsLinear Regression
