A Fully First-Order Method for Stochastic Bilevel Optimization

Jeongyeol Kwon; Dohyun Kwon; Stephen Wright; Robert Nowak

arXiv:2301.10945·math.OC·January 27, 2023

A Fully First-Order Method for Stochastic Bilevel Optimization

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

PDF

Open Access 1 Video

TL;DR

This paper introduces a fully first-order stochastic method for bilevel optimization that achieves rigorous convergence guarantees and outperforms second-order methods in practical experiments.

Contribution

The paper proposes the F2SA algorithm, a first-order method with non-asymptotic convergence guarantees for stochastic bilevel problems, avoiding expensive Hessian computations.

Findings

01

F2SA converges in $ ilde{O}(rac{1}{ ext{epsilon}^{7/2}})$ iterations with stochastic noise in both levels.

02

Momentum-enhanced estimators improve convergence to $ ilde{O}(rac{1}{ ext{epsilon}^{5/2}})$ iterations.

03

F2SA outperforms second-order methods on MNIST hypercleaning tasks.

Abstract

We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $ϵ$ -stationary solution of the bilevel problem after $ϵ^{- 7/2}, ϵ^{- 5/2}$ , and $ϵ^{- 3/2}$ iterations (each iteration using $O (1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Fully First-Order Method for Stochastic Bilevel Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Risk and Portfolio Optimization