Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

Yubo Zhou; Jun Shu; Junmin Liu; Deyu Meng

arXiv:2602.17947·cs.LG·February 23, 2026

Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

Yubo Zhou, Jun Shu, Junmin Liu, Deyu Meng

PDF

Open Access

TL;DR

This paper analyzes the bias and variance in hypergradient estimation for bilevel hyperparameter optimization, introduces a variance reduction method, and demonstrates improved performance across various tasks.

Contribution

It provides a bias-variance decomposition for hypergradient errors, introduces an ensemble variance reduction strategy, and offers theoretical insights into hypergradient estimation errors.

Findings

01

Variance reduction improves hypergradient accuracy

02

Ensemble strategy enhances hyperparameter optimization performance

03

Theoretical analysis explains overfitting phenomena in HPO

Abstract

Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly focus on reducing the gap between the estimation and ground-truth (i.e., the bias), while ignoring the error due to data distribution (i.e., the variance), which degrades performance. To address this issue, we conduct a bias-variance decomposition for hypergradient estimation error and provide a supplemental detailed analysis of the variance term ignored by previous works. We also present a comprehensive analysis of the error bounds for hypergradient estimation. This facilitates an easy explanation of some phenomena commonly observed in practice, like overfitting to the validation set. Inspired by the derived theories, we propose an ensemble…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Advanced Bandit Algorithms Research