TL;DR
Marco DeepResearch introduces a verification-centric framework for deep research agents, enhancing their ability to conduct reliable, long-horizon investigations by integrating verification at data synthesis, trajectory construction, and inference stages.
Contribution
The paper presents a novel verification-centric design for deep research agents, improving accuracy and scalability across multiple stages of research tasks.
Findings
Outperforms 8B-scale agents on challenging benchmarks like BrowseComp.
Uses verification mechanisms to control question difficulty and ensure answer correctness.
Achieves near or better performance than 30B-scale agents with fewer resources.
Abstract
Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: \textbf{(1)~QA Data Synthesis:} We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
