ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
Nandan Thakur, Zijian Chen, Xueguang Ma, Jimmy Lin

TL;DR
ORBIT is a scalable, verifiable data generation framework creating a large, reasoning-intensive dataset for training search agents, demonstrated by strong performance of models trained on it.
Contribution
This work introduces a frugal, modular framework for generating a large synthetic dataset for training reasoning-capable search agents without relying on paid APIs.
Findings
ORBIT-4B achieves strong performance among sub-4B LLMs on Wikipedia QA tasks.
The dataset spans 15 domains with multi-step reasoning questions.
The framework and datasets are open-sourced for community use.
Abstract
Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queries with short verifiable answers, generated using a frugal framework without relying on paid API services. The modular framework relies on four stages: seed creation, question-answer pair generation, and two stages of verification: self and external. ORBIT spans 15 domains and each training pair requires 4-5 reasoning steps, with external search verification required from the complete web. We train Qwen3-4B as the base model on ORBIT using GRPO and evaluate it on Wikipedia question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
