Learning Theory for Kernel Bilevel Optimization

Fares El Khoury; Edouard Pauwels; Samuel Vaiter; Michael Arbel

arXiv:2502.08457·cs.LG·November 18, 2025

Learning Theory for Kernel Bilevel Optimization

Fares El Khoury, Edouard Pauwels, Samuel Vaiter, Michael Arbel

PDF

1 Video

TL;DR

This paper establishes a theoretical foundation for kernel-based bilevel optimization in machine learning, deriving generalization bounds and analyzing gradient methods in a nonparametric setting.

Contribution

It introduces the first learning-theoretic analysis of Kernel Bilevel Optimization, providing finite-sample bounds and insights into gradient-based methods in this nonparametric framework.

Findings

01

Derived novel finite-sample generalization bounds for KBO.

02

Assessed the statistical accuracy of gradient-based methods in KBO.

03

Numerical experiments on synthetic data support theoretical results.

Abstract

Bilevel optimization has emerged as a technique for addressing a wide range of machine learning problems that involve an outer objective implicitly determined by the minimizer of an inner problem. While prior works have primarily focused on the parametric setting, a learning-theoretic foundation for bilevel optimization in the nonparametric case remains relatively unexplored. In this paper, we take a first step toward bridging this gap by studying Kernel Bilevel Optimization (KBO), where the inner objective is optimized over a reproducing kernel Hilbert space. This setting enables rich function approximation while providing a foundation for rigorous theoretical analysis. In this context, we derive novel finite-sample generalization bounds for KBO, leveraging tools from empirical process theory. These bounds further allow us to assess the statistical accuracy of gradient-based methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Theory for Kernel Bilevel Optimization· slideslive