Optimizing for Generalization in Machine Learning with Cross-Validation   Gradients

Shane Barratt; Rishi Sharma

arXiv:1805.07072·stat.ML·May 21, 2018·1 cites

Optimizing for Generalization in Machine Learning with Cross-Validation Gradients

Shane Barratt, Rishi Sharma

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that cross-validation risk is differentiable with respect to hyperparameters and training data, enabling the use of gradient-based optimization methods for hyperparameter tuning in machine learning.

Contribution

It introduces a novel cross-validation gradient method (CVGM) that efficiently optimizes hyperparameters by leveraging the differentiability of cross-validation risk.

Findings

01

Cross-validation risk is differentiable for common algorithms.

02

CVGM enables efficient hyperparameter optimization in high-dimensional spaces.

03

The method improves model selection by directly optimizing generalization performance.

Abstract

Cross-validation is the workhorse of modern applied statistics and machine learning, as it provides a principled framework for selecting the model that maximizes generalization performance. In this paper, we show that the cross-validation risk is differentiable with respect to the hyperparameters and training data for many common machine learning algorithms, including logistic regression, elastic-net regression, and support vector machines. Leveraging this property of differentiability, we propose a cross-validation gradient method (CVGM) for hyperparameter optimization. Our method enables efficient optimization in high-dimensional hyperparameter spaces of the cross-validation risk, the best surrogate of the true generalization ability of our learning algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sbarratt/crossval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms