WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum   Reinforcement Learning

Zehan Qi; Xiao Liu; Iat Long Iong; Hanyu Lai; Xueqiao Sun; Wenyi Zhao,; Yu Yang; Xinyue Yang; Jiadai Sun; Shuntian Yao; Tianjie Zhang; Wei Xu; Jie; Tang; Yuxiao Dong

arXiv:2411.02337·cs.CL·January 28, 2025

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao,, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie, Tang, Yuxiao Dong

PDF

Open Access 1 Repo 3 Models

TL;DR

WebRL introduces a self-evolving curriculum reinforcement learning framework that significantly enhances open LLM web agents' performance, enabling them to outperform proprietary models and previous open models in web tasks.

Contribution

The paper presents WebRL, a novel online curriculum reinforcement learning approach that improves open LLM web agents by generating tasks from failures and employing adaptive strategies.

Findings

01

WebRL boosts Llama-3.1 success rate from 4.8% to 42.4%.

02

WebRL increases GLM-4 success rate from 6.1% to 43%.

03

Open models surpass GPT-4-based agents in web tasks.

Abstract

Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

THUDM/WebRL
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics