Re-ReST: Reflection-Reinforced Self-Training for Language Agents

Zi-Yi Dou; Cheng-Fu Yang; Xueqing Wu; Kai-Wei Chang; Nanyun Peng

arXiv:2406.01495·cs.CL·May 8, 2025·3 cites

Re-ReST: Reflection-Reinforced Self-Training for Language Agents

Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Re-ReST, a reflection-reinforced self-training method that improves language agents by refining generated samples using external feedback, significantly enhancing performance across various tasks.

Contribution

The paper proposes Re-ReST, a novel reflection-based technique to refine self-generated samples, boosting language agent performance without relying on human annotations or stronger models.

Findings

01

Self-training improves performance on multiple tasks.

02

Re-ReST further enhances results by refining samples.

03

Reflection during inference is feasible without ground-truth feedback.

Abstract

Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonstrations. Self-training, however, requires high-quality model-generated samples, which are hard to obtain for challenging language agent tasks. To address this, we present Reflection-Reinforced Self-Training (Re-ReST), which uses a \textit{reflector} to refine low-quality generated samples during self-training. The reflector takes the agent's output and feedback from an external environment (e.g., unit test results in code generation) to produce improved samples. This technique enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PlusLabNLP/Re-ReST
pytorchOfficial

Videos

Re-ReST: Reflection-Reinforced Self-Training for Language Agents· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling