Analysis of Testing-Based Forward Model Selection
Damian Kozbur

TL;DR
This paper proposes Testing-based forward model selection (TBFMS), a method that sequentially adds covariates based on hypothesis tests, with theoretical guarantees on prediction error and covariate selection in linear regression.
Contribution
It introduces TBFMS, a novel covariate selection procedure with probabilistic bounds, applicable to heteroskedastic data, and matching the convergence rates of established high-dimensional estimators.
Findings
Probabilistic bounds for prediction error and covariate count.
Specialized tests for heteroskedastic data using Huber-Eicker-White errors.
Estimation convergence rates comparable to Lasso.
Abstract
This paper introduces and analyzes a procedure called Testing-based forward model selection (TBFMS) in linear regression problems. This procedure inductively selects covariates that add predictive power into a working statistical model before estimating a final regression. The criterion for deciding which covariate to include next and when to stop including covariates is derived from a profile of traditional statistical hypothesis tests. This paper proves probabilistic bounds, which depend on the quality of the tests, for prediction error and the number of selected covariates. As an example, the bounds are then specialized to a case with heteroskedastic data, with tests constructed with the help of Huber-Eicker-White standard errors. Under the assumed regularity conditions, these tests lead to estimation convergence rates matching other common high-dimensional estimators including Lasso.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
