Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

Minwu Kim; Safal Shrestha; Anubhav Shrestha; Keith Ross

arXiv:2601.20829·cs.LG·May 12, 2026

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

Minwu Kim, Safal Shrestha, Anubhav Shrestha, Keith Ross

PDF

1 Repo

TL;DR

This paper introduces failure-prefix conditioning, a method to enhance reasoning models' learning from saturated problems by focusing on failure states, improving performance without needing harder problems.

Contribution

The paper proposes failure-prefix conditioning, a novel technique that leverages failure trajectories to improve learning from saturated reasoning problems in RLVR.

Findings

01

Failure-prefix conditioning improves model performance on saturated problems.

02

The method reduces performance degradation caused by misleading failure prefixes.

03

Iterative failure prefix refresh further enhances learning after initial plateaus.

Abstract

As Reinforcement Learning with Verifiable Rewards (RLVR) substantially improves the reasoning abilities of large language models (LLMs), a new bottleneck emerges: more training problems become saturated, that is, the LLM answers the questions correctly for nearly every rollout. On such problems, rewards provide little useful learning signal. While collecting harder problems is a natural response, it is costly and increasingly difficult. We propose failure-prefix conditioning, a simple method that unlocks the remaining signal in saturated problems by shifting exploration toward failure-prone reasoning states. By conditioning on prefixes of rare incorrect trajectories, the method improves the model's ability to recover from misleading early reasoning. We observe that failure-prefix conditioning consistently improves performance where standard RLVR stalls, and achieves gains comparable to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minwukim/training-on-saturated-problems
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.