ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

Jiaqi Li; Xinyi Dong; Yang Liu; Zhizhuo Yang; Quansen Wang; Xiaobo Wang; SongChun Zhu; Zixia Jia; Zilong Zheng

arXiv:2505.16475·cs.AI·May 23, 2025

ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

Jiaqi Li, Xinyi Dong, Yang Liu, Zhizhuo Yang, Quansen Wang, Xiaobo Wang, SongChun Zhu, Zixia Jia, Zilong Zheng

PDF

Open Access 1 Datasets

TL;DR

ReflectEvo introduces a self-reflection learning pipeline for small language models, significantly improving their reasoning abilities through iterative self-generated reflection and a large-scale dataset, surpassing some open-source models.

Contribution

The paper presents a novel reflection learning pipeline and a large-scale reflection dataset, demonstrating substantial reasoning improvements in small language models without human annotations.

Findings

01

Llama-3's reasoning accuracy improved from 52.4% to 71.2%.

02

Mistral's reasoning accuracy improved from 44.4% to 71.1%.

03

ReflectEvo can rival or surpass open-source models on BIG-bench.

Abstract

We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs' reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

bigai-nlco/ReflectionEvo
dataset· 129 dl
129 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsShrink and Fine-Tune · Direct Preference Optimization