Maximum Likelihood Reinforcement Learning

Fahim Tajwar; Guanning Zeng; Yueer Zhou; Yuda Song; Daman Arora; Yiding Jiang; Jeff Schneider; Ruslan Salakhutdinov; Haiwen Feng; Andrea Zanette

arXiv:2602.02710·cs.LG·February 4, 2026

Maximum Likelihood Reinforcement Learning

Fahim Tajwar, Guanning Zeng, Yueer Zhou, Yuda Song, Daman Arora, Yiding Jiang, Jeff Schneider, Ruslan Salakhutdinov, Haiwen Feng, Andrea Zanette

PDF

Open Access 4 Models

TL;DR

MaxRL is a novel reinforcement learning framework that approximates maximum likelihood training, leading to significant efficiency gains and better scalability in sampling-based tasks with binary feedback.

Contribution

Introduces MaxRL, a sampling-based method that interpolates between standard RL and maximum likelihood, with a simple unbiased policy-gradient estimator and convergence guarantees.

Findings

01

MaxRL outperforms existing methods across tested models and tasks.

02

Achieves up to 20x test-time efficiency gains.

03

Scales better with additional data and compute.

Abstract

Reinforcement learning is the method of choice to train models in sampling-based setups with binary outcome feedback, such as navigation, code generation, and mathematical problem solving. In such settings, models implicitly induce a likelihood over correct rollouts. However, we observe that reinforcement learning does not maximize this likelihood, and instead optimizes only a lower-order approximation. Inspired by this observation, we introduce Maximum Likelihood Reinforcement Learning (MaxRL), a sampling-based framework to approximate maximum likelihood using reinforcement learning techniques. MaxRL addresses the challenges of non-differentiable sampling by defining a compute-indexed family of sample-based objectives that interpolate between standard reinforcement learning and exact maximum likelihood as additional sampling compute is allocated. The resulting objectives admit a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications