Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems
Lucy Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

TL;DR
This paper introduces VF-iDCA, a novel algorithm that finds stationary solutions for bilevel hyperparameter tuning problems without requiring strong convexity or smoothness assumptions, outperforming existing methods.
Contribution
The paper develops a new Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA) that converges for a broad class of hyperparameter tuning problems without traditional convexity assumptions.
Findings
VF-iDCA achieves stationary solutions without LLSC and LLS assumptions.
Experimental results show VF-iDCA outperforms existing hyperparameter tuning methods.
Theoretical analysis confirms convergence properties of VF-iDCA.
Abstract
Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Risk and Portfolio Optimization
