# An analysis of the cost of hyper-parameter selection via split-sample   validation, with applications to penalized regression

**Authors:** Jean Feng, Noah Simon

arXiv: 1903.12297 · 2019-04-01

## TL;DR

This paper investigates how the generalization error grows with the number of hyper-parameters in model selection, providing finite-sample bounds and analyzing penalized regression with multiple penalties.

## Contribution

It establishes finite-sample oracle inequalities for hyper-parameter tuning via split-sample validation and cross-validation, especially for penalized regression with multiple penalties.

## Key findings

- Error from hyper-parameter tuning shrinks at nearly parametric rate for smooth models.
- Adding hyper-parameters is akin to adding model parameters in parametric cases.
- Lipschitz continuity of penalized models supports multiple penalty parameters.

## Abstract

In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice they are often estimated using split-sample validation. Up to now, there is an open question regarding how the generalization error of the selected model grows with the number of hyper-parameters to be estimated. To answer this question, we establish finite-sample oracle inequalities for selection based on a single training/test split and based on cross-validation. We show that if the model-estimation procedures are smoothly parameterized by the hyper-parameters, the error incurred from tuning hyper-parameters shrinks at nearly a parametric rate. Hence for semi- and non-parametric model-estimation procedures with a fixed number of hyper-parameters, this additional error is negligible. For parametric model-estimation procedures, adding a hyper-parameter is roughly equivalent to adding a parameter to the model itself. In addition, we specialize these ideas for penalized regression problems with multiple penalty parameters. We establish that the fitted models are Lipschitz in the penalty parameters and thus our oracle inequalities apply. This result encourages development of regularization methods with many penalty parameters.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.12297/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1903.12297/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1903.12297/full.md

---
Source: https://tomesphere.com/paper/1903.12297