Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition
Yubo Zhou, Jun Shu, Junmin Liu, Deyu Meng

TL;DR
This paper analyzes the bias and variance in hypergradient estimation for bilevel hyperparameter optimization, introduces a variance reduction method, and demonstrates improved performance across various tasks.
Contribution
It provides a bias-variance decomposition for hypergradient errors, introduces an ensemble variance reduction strategy, and offers theoretical insights into hypergradient estimation errors.
Findings
Variance reduction improves hypergradient accuracy
Ensemble strategy enhances hyperparameter optimization performance
Theoretical analysis explains overfitting phenomena in HPO
Abstract
Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly focus on reducing the gap between the estimation and ground-truth (i.e., the bias), while ignoring the error due to data distribution (i.e., the variance), which degrades performance. To address this issue, we conduct a bias-variance decomposition for hypergradient estimation error and provide a supplemental detailed analysis of the variance term ignored by previous works. We also present a comprehensive analysis of the error bounds for hypergradient estimation. This facilitates an easy explanation of some phenomena commonly observed in practice, like overfitting to the validation set. Inspired by the derived theories, we propose an ensemble…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Advanced Bandit Algorithms Research
