Non-Asymptotic Global Convergence of PPO-Clip

Yin Liu; Qiming Dai; Junyu Zhang; Zaiwen Wen

arXiv:2512.16565·math.OC·December 19, 2025

Non-Asymptotic Global Convergence of PPO-Clip

Yin Liu, Qiming Dai, Junyu Zhang, Zaiwen Wen

PDF

Open Access

TL;DR

This paper provides a rigorous theoretical analysis of the PPO-Clip algorithm, establishing non-asymptotic convergence rates and conditions for global optimality in reinforcement learning with policy regularization.

Contribution

It introduces a non-asymptotic convergence analysis of PPO-Clip under general RL settings with f-divergence regularization, including new smoothness and inequality conditions.

Findings

01

Proves linear convergence to the global optimum for forward KL-regularizer.

02

Establishes stationary and local linear convergence for reverse KL-regularizer.

03

Provides theoretical foundations for PPO-Clip's empirical success.

Abstract

Reinforcement learning (RL) has gained attention for aligning large language models (LLMs) via reinforcement learning from human feedback (RLHF). The actor-only variants of Proximal Policy Optimization (PPO) are widely applied for their efficiency. These algorithms incorporate a clipping mechanism to improve stability. Besides, a regularization term, such as the reverse KL-divergence or a more general \(f\)-divergence, is introduced to prevent policy drift. Despite their empirical success, a rigorous theoretical understanding of the problem and the algorithm's properties is limited. This paper advances the theoretical foundations of the PPO-Clip algorithm by analyzing a deterministic actor-only PPO algorithm within the general RL setting with \(f\)-divergence regularization under the softmax policy parameterization. We derive a non-uniform Lipschitz smoothness condition and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Speech and dialogue systems · Machine Learning and Algorithms