Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

Uri Sherman; Tomer Koren; Yishay Mansour

arXiv:2507.04406·cs.LG·July 8, 2025

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

Uri Sherman, Tomer Koren, Yishay Mansour

PDF

TL;DR

This paper introduces a unified framework for agnostic reinforcement learning using first-order optimization, providing new algorithms, convergence analysis, and empirical validation under a weaker assumption than traditional conditions.

Contribution

It proposes a general policy learning framework that reduces agnostic RL to first-order optimization, deriving new algorithms and analyzing their convergence under the VGD condition.

Findings

01

Sample complexity bounds for three policy algorithms.

02

Reinterpretation of Conservative Policy Iteration via Frank-Wolfe.

03

Empirical validation of the VGD condition in standard environments.

Abstract

We study reinforcement learning (RL) in the agnostic policy learning setting, where the goal is to find a policy whose performance is competitive with the best policy in a given class of interest $Π$ -- crucially, without assuming that $Π$ contains the optimal policy. We propose a general policy learning framework that reduces this problem to first-order optimization in a non-Euclidean space, leading to new algorithms as well as shedding light on the convergence properties of existing ones. Specifically, under the assumption that $Π$ is convex and satisfies a variational gradient dominance (VGD) condition -- an assumption known to be strictly weaker than more standard completeness and coverability conditions -- we obtain sample complexity upper bounds for three policy learning algorithms: \emph{(i)} Steepest Descent Policy Optimization, derived from a constrained steepest descent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.