Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases
Fabien Navarro, Adrien Saumard

TL;DR
This paper analyzes the effectiveness of slope heuristics, V-fold cross-validation, and V-fold penalization for model selection in heteroscedastic regression with strongly localized bases, establishing their asymptotic optimality and comparing their practical performance.
Contribution
It introduces a new class of strongly localized bases for regression and proves the asymptotic optimality of slope heuristics and V-fold penalization in this context.
Findings
Slope heuristics are asymptotically optimal when the penalty shape is known.
V-fold cross-validation is suboptimal for fixed V, recovering an oracle with reduced data.
V-fold penalization performs comparably to V-fold cross-validation in practice.
Abstract
We investigate the optimality for model selection of the so-called slope heuristics, -fold cross-validation and -fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics---when the optimal penalty shape is known---and -fold penalization. Furthermore, -fold cross-validation seems to be suboptimal for a fixed value of since it recovers asymptotically the oracle learned from a sample size equal to of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques
