WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Zhepei Wei; Wenlin Yao; Yao Liu; Weizhi Zhang; Qin Lu; Liang Qiu; Changlong Yu; Puyang Xu; Chao Zhang; Bing Yin; Hyokun Yun; Lihong Li

arXiv:2505.16421·cs.CL·October 10, 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li

PDF

Open Access 1 Repo 1 Video

TL;DR

WebAgent-R1 introduces an end-to-end multi-turn reinforcement learning framework for training web agents, significantly improving task success rates on web interaction benchmarks by learning directly from online environment interactions.

Contribution

The paper presents a novel multi-turn RL approach for web agents that learns from online interactions, incorporating thinking-based prompting and chain-of-thought reasoning strategies.

Findings

01

Boosts task success rate of Qwen-2.5-3B from 6.1% to 33.9%.

02

Enhances Llama-3.1-8B success rate from 8.5% to 44.8%.

03

Demonstrates effectiveness of thinking-based prompting and increased interactions.

Abstract

While reinforcement learning (RL) has demonstrated remarkable success in enhancing large language models (LLMs), it has primarily focused on single-turn tasks such as solving math problems. Training effective web agents for multi-turn interactions remains challenging due to the complexity of long-horizon decision-making across dynamic web interfaces. In this work, we present WebAgent-R1, a simple yet effective end-to-end multi-turn RL framework for training web agents. It learns directly from online interactions with web environments by asynchronously generating diverse trajectories, entirely guided by binary rewards depending on task success. Experiments on the WebArena-Lite benchmark demonstrate the effectiveness of WebAgent-R1, boosting the task success rate of Qwen-2.5-3B from 6.1% to 33.9% and Llama-3.1-8B from 8.5% to 44.8%, significantly outperforming existing state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weizhepei/webagent-r1
noneOfficial

Videos

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques