Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following

Kongcheng Zhang; Qi Yao; Shunyu Liu; Wenjian Zhang; Min Cen; Yang Zhou; Wenkai Fang; Yiru Zhao; Baisheng Lai; Mingli Song

arXiv:2512.23457·cs.AI·December 30, 2025

Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following

Kongcheng Zhang, Qi Yao, Shunyu Liu, Wenjian Zhang, Min Cen, Yang Zhou, Wenkai Fang, Yiru Zhao, Baisheng Lai, Mingli Song

PDF

Open Access

TL;DR

This paper introduces HiR, a sample-efficient reinforcement learning framework that reinterprets failed instruction responses as successes to improve instruction-following models with less computational cost.

Contribution

The paper presents HiR, a novel replay strategy that leverages hindsight to turn failures into successes, enhancing RL efficiency for complex instruction following tasks.

Findings

01

HiR improves instruction-following performance across tasks.

02

The method reduces computational requirements for RL.

03

Hindsight replay effectively utilizes binary reward signals.

Abstract

Reinforcement Learning (RL) has shown promise for aligning Large Language Models (LLMs) to follow instructions with various constraints. Despite the encouraging results, RL improvement inevitably relies on sampling successful, high-quality responses; however, the initial model often struggles to generate responses that satisfy all constraints due to its limited capabilities, yielding sparse or indistinguishable rewards that impede learning. In this work, we propose Hindsight instruction Replay (HiR), a novel sample-efficient RL framework for complex instruction following tasks, which employs a select-then-rewrite strategy to replay failed attempts as successes based on the constraints that have been satisfied in hindsight. We perform RL on these replayed samples as well as the original ones, theoretically framing the objective as dual-preference learning at both the instruction- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning