A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Mingyi Hong, Hoi-To Wai, Zhaoran Wang, and Zhuoran Yang

TL;DR
This paper introduces a two-timescale stochastic algorithm framework for bilevel optimization, providing convergence analysis and demonstrating its application to actor-critic methods in reinforcement learning.
Contribution
It proposes a novel TTSA algorithm for bilevel problems with convergence rates and applies it to analyze a natural actor-critic algorithm.
Findings
TTSA achieves $ ext{O}(K^{-2/3})$ convergence for strongly convex outer problems.
TTSA achieves $ ext{O}(K^{-2/5})$ convergence for weakly convex outer problems.
Natural actor-critic converges at a rate of $ ext{O}(K^{-1/4})$ in expected reward gap.
Abstract
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research
