SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Fangyuan Xu; Rujun Han; Yanfei Chen; Zifeng Wang; I-Hung Hsu; Jun Yan; Vishy Tirumalashetty; Eunsol Choi; Tomas Pfister; Chen-Yu Lee

arXiv:2601.18202·cs.AI·January 27, 2026

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Fangyuan Xu, Rujun Han, Yanfei Chen, Zifeng Wang, I-Hung Hsu, Jun Yan, Vishy Tirumalashetty, Eunsol Choi, Tomas Pfister, Chen-Yu Lee

PDF

Open Access

TL;DR

SAGE is a novel pipeline that automatically generates high-quality, difficulty-controlled deep search question-answer pairs, improving deep search agent performance and adaptability without extensive human annotation.

Contribution

We introduce SAGE, an agentic pipeline that iteratively refines synthetic QA pairs for deep search, enabling scalable data generation with difficulty control and improved agent training.

Findings

01

SAGE generates diverse, challenging questions requiring various reasoning strategies.

02

Training on SAGE data yields up to 23% performance improvement on deep search benchmarks.

03

Agents trained on SAGE data can adapt to Google Search without additional training.

Abstract

Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for this application is prohibitively expensive due to long and complex exploration trajectories. We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs for a given corpus and a target difficulty level. Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question and provide execution feedback for the data generator. The two components interact over multiple rounds to iteratively refine the question-answer pairs until they satisfy the target difficulty level. Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems