Structured Distillation of Web Agent Capabilities Enables Generalization

Xing Han L\`u; Siva Reddy

arXiv:2604.07776·cs.LG·April 10, 2026

Structured Distillation of Web Agent Capabilities Enables Generalization

Xing Han L\`u, Siva Reddy

PDF

2 Repos 4 Models 1 Datasets

TL;DR

This paper presents a structured approach to training web agents using synthetic trajectories generated by LLMs, enabling competitive performance and transferability across unseen environments.

Contribution

The authors introduce Agent-as-Annotators, a modular framework for synthetic trajectory generation that improves web agent generalization and performance with a single teacher model.

Findings

01

Achieved 41.5% on WebArena, surpassing closed-source models.

02

Nearly doubled previous open-weight best result (21.7%).

03

Transferred capabilities effectively to unseen environments.

Abstract

Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and Supervisor with modular LLM components. Using Gemini 3 Pro as teacher, we generate 3,000 trajectories across six web environments and fine-tune a 9B-parameter student with pure supervised learning on the 2,322 that pass quality filtering. The resulting model achieves 41.5% on WebArena, surpassing closed-source models such as Claude 3.5 Sonnet (36.0%) and GPT-4o (31.5%) under the same evaluation protocol, and nearly doubling the previous best open-weight result (Go-Browse, 21.7%). Capabilities transfer to unseen environments, with an 18.2 percentage point gain on WorkArena…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

McGill-NLP/A3-Synth
dataset· 3.0k dl
3.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.