TL;DR
WebFactory introduces an automated reinforcement learning pipeline that compresses large language models' internet knowledge into efficient GUI agents, achieving high performance with minimal data from only 10 websites.
Contribution
The paper presents a novel, fully automated pipeline for training grounded GUI agents that efficiently compress LLM knowledge into actionable behaviors, reducing data requirements.
Findings
Agent trained on synthetic data from 10 websites matches performance of models trained on larger datasets.
Our agent outperforms the base foundation model on internal offline and online transfer benchmarks.
WebFactory demonstrates high data efficiency and strong generalization capabilities.
Abstract
Current paradigms for training GUI agents are fundamentally limited by a reliance on either unsafe, non-reproducible live web interactions or costly, scarce human-crafted data and environments. We argue this focus on data volume overlooks a more critical factor: the efficiency of compressing a large language model's (LLM) latent knowledge into actionable agent behavior. We introduce WebFactory, a novel, fully automated closed-loop reinforcement learning pipeline for GUI agents, systematically compressing LLM-encoded internet intelligence into efficient, grounded actions. Our pipeline features a process of scalable environment synthesis, knowledge-aware task generation, LLM-powered trajectory collection, decomposed reward RL training, and systematic agent evaluation. Remarkably, our agent demonstrates exceptional data efficiency and generalization. Trained on synthetic data from only 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
