Escaping the Verifier: Learning to Reason via Demonstrations

Locke Cai; Ivan Provilkov

arXiv:2511.21667·cs.LG·December 10, 2025

Escaping the Verifier: Learning to Reason via Demonstrations

Locke Cai, Ivan Provilkov

PDF

Open Access

TL;DR

This paper introduces RARO, a method that leverages expert demonstrations and adversarial training to enhance reasoning capabilities in large language models without relying on task-specific verifiers.

Contribution

RARO is a novel adversarial learning approach that trains reasoning models solely from expert demonstrations using inverse reinforcement learning, eliminating the need for verifiers.

Findings

01

RARO outperforms verifier-free baselines on reasoning tasks

02

The method scales robustly with larger models and data

03

It effectively learns reasoning skills from demonstrations alone

Abstract

Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization) that learns strong reasoning capabilities from only expert demonstrations via Inverse Reinforcement Learning. Our method sets up an adversarial game between a policy and a relativistic critic: the policy learns to mimic expert answers, while the critic aims to identify the experts among (expert, policy) answer pairs. Both the policy and the critic are trained jointly and continuously via RL, and we identify the key stabilization techniques required for robust learning. Empirically, RARO significantly outperforms strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)