Optimizing for Generalization in Machine Learning with Cross-Validation Gradients
Shane Barratt, Rishi Sharma

TL;DR
This paper demonstrates that cross-validation risk is differentiable with respect to hyperparameters and training data, enabling the use of gradient-based optimization methods for hyperparameter tuning in machine learning.
Contribution
It introduces a novel cross-validation gradient method (CVGM) that efficiently optimizes hyperparameters by leveraging the differentiability of cross-validation risk.
Findings
Cross-validation risk is differentiable for common algorithms.
CVGM enables efficient hyperparameter optimization in high-dimensional spaces.
The method improves model selection by directly optimizing generalization performance.
Abstract
Cross-validation is the workhorse of modern applied statistics and machine learning, as it provides a principled framework for selecting the model that maximizes generalization performance. In this paper, we show that the cross-validation risk is differentiable with respect to the hyperparameters and training data for many common machine learning algorithms, including logistic regression, elastic-net regression, and support vector machines. Leveraging this property of differentiability, we propose a cross-validation gradient method (CVGM) for hyperparameter optimization. Our method enables efficient optimization in high-dimensional hyperparameter spaces of the cross-validation risk, the best surrogate of the true generalization ability of our learning algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms
