Learning Hyperparameters via a Data-Emphasized Variational Objective
Ethan Harvey, Mikhail Petrov, Michael C. Hughes

TL;DR
This paper introduces a data-emphasized variational objective for hyperparameter learning that avoids validation sets and significantly reduces tuning time while maintaining accuracy.
Contribution
It proposes a novel data-emphasized ELBO that improves hyperparameter optimization efficiency in Bayesian models, especially for large models with limited data.
Findings
Reduces hyperparameter tuning time from over 88 hours to under 3 hours.
Achieves comparable accuracy to traditional grid search methods.
Enables efficient Gaussian process approximations with learnable kernels.
Abstract
When training large models on limited data, avoiding overfitting is paramount. Common grid search or smarter search methods rely on expensive separate runs for each candidate hyperparameter, while carving out a validation set that reduces available training data. In this paper, we study gradient-based learning of hyperparameters via the evidence lower bound (ELBO) objective from Bayesian variational methods. This avoids the need for any validation set. We focus on scenarios where the model is over-parameterized for flexibility and the approximate posterior is chosen to be Gaussian with isotropic covariance for tractability, even though it cannot match the true posterior. In such scenarios, we find the ELBO prioritizes posteriors that match the prior, leading to severe underfitting. Instead, we recommend a data-emphasized ELBO that upweights the likelihood but not the prior. In Bayesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
