Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Creighton Glasscock, Honglak Lee

TL;DR
This paper introduces a scalable pipeline for web agent training that uses automatic data generation and a novel fine-grained evaluation framework, enabling improved model performance on complex web tasks.
Contribution
It presents a new constraint-based evaluation method and a comprehensive benchmark, enhancing data efficiency and model quality for web agents.
Findings
The method outperforms open-source approaches.
The distilled model matches or exceeds commercial systems.
The approach significantly expands usable training data.
Abstract
We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards task completion. We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion. This enables us to leverage partially successful trajectories, which significantly expands the amount of usable training data. We evaluate our method on a new benchmark we propose called BookingArena, which consists of complex booking tasks across 20 popular websites, and demonstrate that our distilled student model outperforms open-source approaches and matches or exceeds commercial systems, while being a significantly smaller model. Our work addresses the challenge of efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWeb Data Mining and Analysis · Recommender Systems and Techniques · Intelligent Tutoring Systems and Adaptive Learning
