Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Yuwei Zhang; Sha Li; Changlong Yu; Qin Lu; Shuowei Jin; Chengyu Dong; Haoran Liu; Ilgee Hong; Xintong Li; Zhenyu Shi; Bing Yin; Jingbo Shang

arXiv:2605.12741·cs.LG·May 14, 2026

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Yuwei Zhang, Sha Li, Changlong Yu, Qin Lu, Shuowei Jin, Chengyu Dong, Haoran Liu, Ilgee Hong, Xintong Li, Zhenyu Shi, Bing Yin, Jingbo Shang

PDF

TL;DR

This paper introduces RESD, a novel self-distillation framework that leverages failure feedback through reflection to improve large language models, especially in rare-success scenarios.

Contribution

It proposes a reflection-based method to transform failure feedback into corrective supervision, enhancing learning efficiency and performance in low-success regimes.

Findings

01

RESD outperforms standard self-distillation baselines in multiple tasks.

02

It achieves faster early-stage improvement with fewer samples.

03

RESD enables effective learning even with rare successes.

Abstract

Enabling Large Language Models (LLMs) to continuously improve from environmental interactions is a central challenge in post-training. While on-policy self-distillation offers a promising paradigm, existing methods predominantly treat environmental feedback as a passive conditioning signal. Consequently, they heavily rely on successful demonstrations and struggle to learn in rare-success regimes. To bridge this gap, we introduce Reflection-Enhanced Self-Distillation (RESD), a framework that transforms raw failure feedback into an active source of corrective supervision. Instead of passively appending feedback, RESD interprets failed trajectories by generating retrospective reflections to diagnose local errors, and curates a persistent global playbook to preserve reusable lessons across training steps. The enriched context enables the self-teacher to provide actionable token-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.