Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning
Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao

TL;DR
Search-E1 introduces a simple self-evolution approach for search-augmented reasoning agents that improves performance through offline self-distillation without complex machinery.
Contribution
It proposes a minimalistic self-distillation method enabling search-augmented agents to self-improve without external supervision or elaborate modifications.
Findings
Outperforms all open-source baselines at both scales on seven QA benchmarks.
Achieves an average EM of 0.440 with Qwen2.5-3B.
Demonstrates that simple self-distillation can replace complex augmentation techniques.
Abstract
Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standard pipeline. These augmentations import external supervision from stronger external systems, attach auxiliary modules such as process reward models or retrospective critics, restructure the rollout itself with tree search or multi-stage curricula, or shape the reward with hand-crafted bonuses and penalties. Each addition delivers a measurable gain, but each also inflates the training pipeline and ties the recipe to resources or designs that may not always be available. We take a step back and ask whether any of this machinery is actually necessary, and propose Search-E1, a self-evolution method that lets a search-augmented agent improve through only vanilla GRPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
