Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran; Jaekyeom Kim; Sungryull Sohn; Creighton Glasscock; Honglak Lee

arXiv:2602.12544·cs.AI·February 16, 2026

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Creighton Glasscock, Honglak Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces a scalable pipeline for web agent training that uses automatic data generation and a novel fine-grained evaluation framework, enabling improved model performance on complex web tasks.

Contribution

It presents a new constraint-based evaluation method and a comprehensive benchmark, enhancing data efficiency and model quality for web agents.

Findings

01

The method outperforms open-source approaches.

02

The distilled model matches or exceeds commercial systems.

03

The approach significantly expands usable training data.

Abstract

We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards task completion. We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion. This enables us to leverage partially successful trajectories, which significantly expands the amount of usable training data. We evaluate our method on a new benchmark we propose called BookingArena, which consists of complex booking tasks across 20 popular websites, and demonstrate that our distilled student model outperforms open-source approaches and matches or exceeds commercial systems, while being a significantly smaller model. Our work addresses the challenge of efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation· underline

Taxonomy

TopicsWeb Data Mining and Analysis · Recommender Systems and Techniques · Intelligent Tutoring Systems and Adaptive Learning