RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Kaiwen Zha; Zhengqi Gao; Maohao Shen; Zhang-Wei Hong; Duane S. Boning; Dina Katabi

arXiv:2505.15034·cs.LG·October 24, 2025

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi

PDF

Open Access 1 Repo

TL;DR

Tango introduces a novel reinforcement learning framework that simultaneously trains a language model generator and a generative verifier, leading to improved reasoning and robustness in large language models.

Contribution

It proposes a co-evolutionary RL approach for training both generator and verifier together, enhancing generalization and reasoning capabilities of LLMs.

Findings

01

State-of-the-art performance on math benchmarks

02

Superior generalization to out-of-domain tasks

03

Significant improvements on complex reasoning problems

Abstract

Reinforcement learning (RL) has recently emerged as a compelling approach for enhancing the reasoning capabilities of large language models (LLMs), where an LLM generator serves as a policy guided by a verifier (reward model). However, current RL post-training methods for LLMs typically use verifiers that are fixed (rule-based or frozen pretrained) or trained discriminatively via supervised fine-tuning (SFT). Such designs are susceptible to reward hacking and generalize poorly beyond their training distributions. To overcome these limitations, we propose Tango, a novel framework that uses RL to concurrently train both an LLM generator and a verifier in an interleaved manner. A central innovation of Tango is its generative, process-level LLM verifier, which is trained via RL and co-evolves with the generator. Importantly, the verifier is trained solely based on outcome-level verification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaiwenzha/rl-tango
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques