Sample Complexity Analysis for Constrained Bilevel Reinforcement Learning

Naman Saxena; Vaneet Aggarwal

arXiv:2602.00282·cs.LG·February 3, 2026

Sample Complexity Analysis for Constrained Bilevel Reinforcement Learning

Naman Saxena, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of sample complexity for constrained bilevel reinforcement learning, proposing an algorithm with specific iteration and sample complexity bounds, and handling non-smooth optimization via the Moreau envelope.

Contribution

It introduces the Constrained Bilevel Subgradient Optimization (CBSO) algorithm with theoretical guarantees, addressing non-smoothness and constraints in bilevel RL.

Findings

01

Iteration complexity of O(ε^{-2})

02

Sample complexity of ~O(ε^{-4})

03

First analysis of policy gradient RL with non-smooth objectives

Abstract

Several important problem settings within the literature of reinforcement learning (RL), such as meta-learning, hierarchical learning, and RL from human feedback (RL-HF), can be modelled as bilevel RL problems. A lot has been achieved in these domains empirically; however, the theoretical analysis of bilevel RL algorithms hasn't received a lot of attention. In this work, we analyse the sample complexity of a constrained bilevel RL algorithm, building on the progress in the unconstrained setting. We obtain an iteration complexity of $O (ϵ^{- 2})$ and sample complexity of $\tilde{O} (ϵ^{- 4})$ for our proposed algorithm, Constrained Bilevel Subgradient Optimization (CBSO). We use a penalty-based objective function to avoid the issue of primal-dual gap and hyper-gradient in the context of a constrained bilevel problem setting. The penalty-based formulation to handle constraints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Variational Analysis · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques