From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation
Chuang Peng, Wei Zhang, Renshuai Tao, Xinhao Zhang, Jian Yang

TL;DR
This paper introduces a new dataset and a progressive curriculum learning approach to improve the robustness and discrimination capabilities of web navigation agents, outperforming large models like GPT-4.5.
Contribution
It presents the Triton dataset and a novel training curriculum that enhances web agents' discrimination and generalization abilities beyond standard fine-tuning.
Findings
Triton-GRPO-32B achieves 58.7% step success rate, surpassing GPT-4.5.
The curriculum-based training outperforms raw scale models in web navigation tasks.
Empirical results validate the effectiveness of specialized data curriculum over larger models.
Abstract
Text-based web agents offer computational efficiency for autonomous web navigation, yet developing robust agents remains challenging due to the noisy and heterogeneous nature of real-world HTML. Standard Supervised Fine-Tuning (SFT) approaches fail in two critical dimensions: they lack discrimination capabilities to reject plausible but incorrect elements in densely populated pages, and exhibit limited generalization to unseen website layouts. To address these challenges, we introduce the Triton dataset (590k instances) and a progressive training curriculum. Triton is constructed via Structural-Semantic Hard Negative Mining, which explicitly mines topologically similar distractors, and a Dual-Agent Consensus pipeline that synthesizes diverse cross-domain tasks with strict verification. Building upon this foundation, our progressive curriculum produces three models: Triton-SFT-32B for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
