Bounded Ratio Reinforcement Learning

Yunke Ao; Le Chen; Bruce D. Lee; Assefa S. Wahd; Aline Czarnobai; Philipp F\"urnstahl; Bernhard Sch\"olkopf; Andreas Krause

arXiv:2604.18578·cs.LG·May 1, 2026

Bounded Ratio Reinforcement Learning

Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp F\"urnstahl, Bernhard Sch\"olkopf, Andreas Krause

PDF

1 Repo

TL;DR

This paper introduces Bounded Ratio Reinforcement Learning (BRRL), a new framework that unifies trust region methods and PPO, providing theoretical guarantees and improved empirical performance across various domains.

Contribution

The paper develops the BRRL framework with an analytical optimal solution, introduces Bounded Policy Optimization (BPO), and extends it to LLM fine-tuning, connecting trust region methods with the Cross-Entropy Method.

Findings

01

BPO outperforms PPO in stability and final performance across MuJoCo, Atari, and IsaacLab environments.

02

BRRL provides a theoretical foundation that explains PPO's success and guarantees monotonic improvement.

03

GBPO effectively fine-tunes LLMs, matching or surpassing existing methods.

Abstract

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in PPO. In this paper, we bridge this gap by introducing the Bounded Ratio Reinforcement Learning (BRRL) framework. We formulate a novel regularized and constrained policy optimization problem and derive its analytical optimal solution. We prove that this solution ensures monotonic performance improvement. To handle parameterized policy classes, we develop a policy optimization algorithm called Bounded Policy Optimization (BPO) that minimizes an advantage-weighted divergence between the policy and the analytic optimal solution from BRRL. We further establish a lower bound on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bounded-ratio-rl/bounded_ratio_rl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.