OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

Yuhang Zhou; Kai Zheng; Qiguang Chen; Mengkang Hu; Qingfeng Sun; Can Xu; Jingjing Chen

arXiv:2601.18467·cs.AI·February 24, 2026

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

Yuhang Zhou, Kai Zheng, Qiguang Chen, Mengkang Hu, Qingfeng Sun, Can Xu, Jingjing Chen

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper demonstrates that offline training with curated datasets and task synthesis can produce research agents competitive with online RL methods, reducing costs and expanding accessibility.

Contribution

It introduces DeepForge, a task synthesis framework, and a large curated dataset, enabling fully offline training of research agents that match online RL performance.

Findings

01

OffSeeker trained offline outperforms similar-sized agents.

02

OffSeeker remains competitive with 30B-parameter online RL models.

03

The approach reduces reliance on expensive online reinforcement learning.

Abstract

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source suite designed for effective offline training. Our core contributions include DeepForge, a ready-to-use task synthesis framework that generates large-scale research queries without heavy preprocessing; and a curated collection of 66k QA pairs, 33k SFT trajectories, and 21k DPO pairs. Leveraging these resources, we train OffSeeker (8B),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
OffSeeker/OffSeeker-8B-DPO
model· 26 dl
26 dl

Datasets

OffSeeker/DeepForge
dataset· 79 dl
79 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Machine Learning and Data Classification · Multimodal Machine Learning Applications