O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Yi Yao; He Zhu; Piaohong Wang; Jincheng Ren; Xinlong Yang; Qianben Chen; Xiaowan Li; Dingfeng Shi; Jiaxian Li; Qiexiang Wang; Sinuo Wang; Xinpeng Liu; Jiaqi Wu; Minghao Liu; Wangchunshu Zhou

arXiv:2601.03743·cs.CL·January 8, 2026

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Yi Yao, He Zhu, Piaohong Wang, Jincheng Ren, Xinlong Yang, Qianben Chen, Xiaowan Li, Dingfeng Shi, Jiaxian Li, Qiexiang Wang, Sinuo Wang, Xinpeng Liu, Jiaqi Wu, Minghao Liu, Wangchunshu Zhou

PDF

Open Access 2 Models 2 Datasets

TL;DR

This paper presents O-Researcher, a framework that uses multi-agent collaboration and reinforcement learning to generate high-quality research data, significantly improving open-source large language models' performance.

Contribution

It introduces a novel multi-agent data synthesis and training approach that enhances open-source LLMs without proprietary data.

Findings

01

Achieved state-of-the-art results on major research benchmarks.

02

Enabled open-source models to match or surpass closed-source counterparts.

03

Demonstrated scalable data generation for model improvement.

Abstract

The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques