SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
Jian Li, Yizhang Jin, Dongqi Liu, Hang Ding, Jiafu Wu, Dongsheng Chen, Yunhang Shen, Yulei Qin, Ying Tai, Chengjie Wang, Xiaotong Yuan, Yabiao Wang

TL;DR
SE-Search introduces a novel self-evolving search agent that enhances online search and question answering by using memory purification, atomic query training, and dense rewards, leading to significant performance improvements.
Contribution
It presents a new self-evolving search framework with three key components that improve search relevance and efficiency over existing methods.
Findings
Outperforms strong baselines on question answering benchmarks.
Achieves a 10.8 point absolute improvement over Search-R1.
Demonstrates the effectiveness of dense rewards in training speed.
Abstract
Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing methods often accumulate irrelevant or noisy documents and rely on sparse reinforcement learning signals. We propose \textbf{S}elf-\textbf{E}volving \textbf{Search}, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards. SE-Search follows a \textit{Think-Search-Memorize} strategy that retains salient evidence while filtering irrelevant content. Atomic query training promotes shorter and more diverse queries, improving evidence acquisition. Dense rewards provide fine-grained feedback that speeds training.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗swordli/SE-Search-3Bmodel· 21 dl· ♡ 121 dl♡ 1
- 🤗swordli/Qwen2.5-3B-Base-SAPOmodel· 71 dl71 dl
- 🤗swordli/Qwen2.5-3B-Instruct-SAPOmodel· 33 dl33 dl
- 🤗swordli/Qwen2.5-1.5B-Instruct-SAPOmodel· 30 dl30 dl
- 🤗swordli/Qwen2.5-7B-Instruct-SAPOmodel· 26 dl26 dl
- 🤗swordli/Qwen2.5-14B-Instruct-SAPOmodel· 31 dl31 dl
- 🤗swordli/Llama-3.2-3B-Instruct-SAPOmodel· 33 dl33 dl
- 🤗swordli/Llama-3.2-3B-Base-SAPOmodel· 32 dl32 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications
