WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning
Yuchen Zhuang, Di Jin, Jiaao Chen, Wenqi Shi, Hanrui Wang, Chao Zhang

TL;DR
This paper introduces WorkForceAgent-R1, a reinforcement learning-based web agent that significantly improves reasoning and planning in web navigation tasks, outperforming supervised fine-tuning methods and rivaling proprietary models.
Contribution
The paper presents a novel R1-style reinforcement learning framework for training LLM-based web agents to enhance reasoning without extensive annotations.
Findings
Outperforms supervised fine-tuning baselines by 10.26-16.59%
Achieves competitive results with proprietary LLM agents
Demonstrates improved robustness and reasoning in web navigation
Abstract
Large language models (LLMs)-empowered web agents enables automating complex, real-time web navigation tasks in enterprise environments. However, existing web agents relying on supervised fine-tuning (SFT) often struggle with generalization and robustness due to insufficient reasoning capabilities when handling the inherently dynamic nature of web interactions. In this study, we introduce WorkForceAgent-R1, an LLM-based web agent trained using a rule-based R1-style reinforcement learning framework designed explicitly to enhance single-step reasoning and planning for business-oriented web navigation tasks. We employ a structured reward function that evaluates both adherence to output formats and correctness of actions, enabling WorkForceAgent-R1 to implicitly learn robust intermediate reasoning without explicit annotations or extensive expert demonstrations. Extensive experiments on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsShrink and Fine-Tune
