SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Hansi Zeng; Zoey Li; Yifan Gao; Chenwei Zhang; Xiaoman Pan; Tao Yang; Fengran Mo; Jiacheng Lin; Xian Li; Jingbo Shang

arXiv:2603.07853·cs.AI·March 10, 2026

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang

PDF

Open Access

TL;DR

This paper introduces SynPlanResearch-R1, a framework that synthesizes tool-use trajectories to enhance exploration in research agents, leading to improved performance on multi-hop and open-web benchmarks.

Contribution

The paper presents a novel method for synthesizing tool-use trajectories to improve exploration and performance in research agents during supervised fine-tuning.

Findings

01

Up to 6.0% performance improvement on Qwen3-8B

02

Up to 5.8% performance improvement on Qwen3-4B

03

Enhanced exploration behaviors compared to baselines

Abstract

Research Agents enable models to gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can in principle be learned via reinforcement learning with verifiable rewards (RLVR), we observe that agents often exhibit poor exploration behaviors, including premature termination and biased tool usage. As a result, RLVR alone yields limited improvements. We propose SynPlanResearch-R1, a framework that synthesizes tool-use trajectories that encourage deeper exploration to shape exploration during cold-start supervised fine-tuning, providing a strong initialization for subsequent RL. Across seven multi-hop and open-web benchmarks, \framework improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones respectively compared to SOTA baselines. Further analyses of tool-use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling