WebSailor: Navigating Super-human Reasoning for Web Agent

Kuan Li; Zhongwang Zhang; Huifeng Yin; Liwen Zhang; Litu Ou; Jialong Wu; Wenbiao Yin; Baixuan Li; Zhengwei Tao; Xinyu Wang; Weizhou Shen; Junkai Zhang; Dingchu Zhang; Xixi Wu; Yong Jiang; Ming Yan; Pengjun Xie; Fei Huang; Jingren Zhou

arXiv:2507.02592·cs.CL·July 4, 2025

WebSailor: Navigating Super-human Reasoning for Web Agent

Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, Weizhou Shen, Junkai Zhang, Dingchu Zhang, Xixi Wu, Yong Jiang, Ming Yan, Pengjun Xie, Fei Huang, Jingren Zhou

PDF

4 Repos

TL;DR

WebSailor is a new training methodology that enhances open-source web agents with superhuman reasoning abilities, enabling them to excel in complex information-seeking tasks by systematically reducing uncertainty.

Contribution

It introduces a novel post-training approach combining high-uncertainty task generation, structured sampling, and a new RL algorithm to significantly improve open-source web agents' performance.

Findings

01

WebSailor outperforms existing open-source agents in complex tasks.

02

It matches proprietary agents' performance in information-seeking benchmarks.

03

The methodology effectively closes the capability gap with proprietary systems.

Abstract

Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to systematically reduce extreme uncertainty when navigating vast information landscapes. Based on this insight, we introduce WebSailor, a complete post-training methodology designed to instill this crucial capability. Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation, RFT cold start, and an efficient agentic RL training algorithm, Duplicating Sampling Policy Optimization (DUPO). With this integrated pipeline, WebSailor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.