Data-driven calibration of penalties for least-squares regression
Sylvain Arlot (LM-Orsay, INRIA Futurs), Pascal Massart (LM-Orsay,, INRIA Futurs)

TL;DR
This paper introduces a data-driven method for calibrating penalties in least-squares regression, extending the minimal penalty concept to heteroscedastic and non-Gaussian data, with initial results on regressogram bin-widths.
Contribution
It generalizes the minimal penalty approach to broader data settings and proposes the slope heuristics for data-driven penalty calibration in least-squares regression.
Findings
The slope heuristics work for heteroscedastic non-Gaussian data.
The method provides a practical way to estimate penalties from data.
Initial mathematical results are proved for regressogram bin-width selection.
Abstract
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from the data. We propose a completely data-driven calibration algorithm for this parameter in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birge and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Soil Geostatistics and Mapping
