WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Zhengwei Tao, Jialong Wu, Wenbiao Yin, Junkai Zhang, Baixuan Li, Haiyang Shen, Kuan Li, Liwen Zhang, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

TL;DR
WebShaper introduces a formalization-driven framework for synthesizing high-quality web-based information-seeking data, enabling the training of more effective LLM-powered agents for complex tasks.
Contribution
It proposes a novel formalization method using set theory and Knowledge Projections to systematically generate training data for IS agents.
Findings
Achieves state-of-the-art performance on GAIA benchmark.
Outperforms existing open-sourced IS agents.
Demonstrates effective control over reasoning structures.
Abstract
The advent of Large Language Model (LLM)-powered agents has revolutionized artificial intelligence by enabling solutions to complex, open-ended tasks through web-based information-seeking (IS) capabilities. The scarcity of high-quality training data has limited the development of IS agents. Existing approaches typically adopt an information-driven paradigm that first collects web data and then generates questions based on the retrieval. However, this may lead to inconsistency between information structure and reasoning structure, question and answer. To mitigate, we propose a formalization-driven IS data synthesis framework WebShaper to construct a dataset. WebShaper systematically formalizes IS tasks through set theory. Central to the formalization is the concept of Knowledge Projections (KP), which enables precise control over reasoning structure by KP operation compositions. During…
Peer Reviews
Decision·ICLR 2026 Poster
+ The paper presents a meaningful shift from an information-driven to a formalization-driven approach for data synthesis, directly addressing the inconsistency issues observed in prior methods. + The formalization based on Knowledge Projections (KPs) and set operations provides fine-grained control over both the reasoning structure and task complexity of synthesized data. + The Layer-wise Expansion Strategy effectively tackles redundancy and reasoning shortcuts common in previous synthesis frame
- The synthesis process—spanning seed generation, multi-agent expansion, and online retrieval—appears computationally expensive. A quantitative comparison of cost (e.g., API calls, runtime, or compute hours) against traditional information-driven synthesis methods would clarify its practical feasibility. - The current set-theoretic grammar (KPs, R-Unions, and Intersections) may not sufficiently capture complex real-world IS tasks, such as those involving temporal, comparative, or counterfactual
S1: Novel training data construction method that improves over previous "information-driven" paradigms. S2: Method improves on performance compared to existing IS agents. S3: WebShaper's training data is more effective compared to the output of other training data generation methods.
W1: It seems the created dataset itself is not available. This would have been a very valuable contributions for the community. The same goes for the code, you claim it is open-source, but it was not provided in the supplement.
1. The paradigm shift from "information-driven" to "formalization-driven" demonstrates originality and theoretical depth. 2. The paper presents a complete pipeline: the KP representation, layer-wise expansion strategy, and Expander agent collectively implement a closed loop of autonomous data generation and quality assurance, demonstrating engineering rigor. 3. The paper validates WebShaper's performance across multiple backbones. Ablation studies across various models and additional analyses c
1. The paper suffers from serious writing and data consistency issues, with multiple conflicting details: (1) Content misalignment between Sections 4.4.3 and 4.4.4; (2) Inconsistent key performance data: Qwen2.5-32B's SFT performance is reported as 44.66 in Figures 3 and 4, but 43.6 in Table 2; Qwen2.5-72B similarly changes from 46.66 to 45.6. Such conflicts undermine the paper's rigor. 2. The paper lacks evaluation on benchmarks such as BrowseComp and XBench, limiting fair comparison with concu
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
