WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Hao Bai, Alexey Taymanov, Tong Zhang, Aviral Kumar, Spencer Whitehead

TL;DR
WebGym is a large-scale, open-source environment for training visual web agents on realistic, diverse tasks, enabling significant improvements in out-of-distribution performance through scalable RL training and fine-tuning.
Contribution
We introduce WebGym, a scalable platform with nearly 300,000 tasks, a high-throughput rollout system, and demonstrate improved out-of-distribution success rates for vision-language models.
Findings
WebGym's rollout system achieves 4-5x speedup.
Fine-tuning on WebGym improves success rate from 26.2% to 42.9%.
Outperforms proprietary models like GPT-4o and GPT-5-Thinking.
Abstract
We present WebGym, the largest-to-date open-source environment for training realistic visual web agents. Real websites are non-stationary and diverse, making artificial or small-scale task sets insufficient for robust policy learning. WebGym contains nearly 300,000 tasks with rubric-based evaluations across diverse, real-world websites and difficulty levels. We train agents with a simple reinforcement learning (RL) recipe, which trains on the agent's own interaction traces (rollouts), using task rewards as feedback to guide learning. To enable scaling RL, we speed up sampling of trajectories in WebGym by developing a high-throughput asynchronous rollout system, designed specifically for web agents. Our system achieves a 4-5x rollout speedup compared to naive implementations. Second, we scale the task set breadth, depth, and size, which results in continued performance improvement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
