OpAgent: Operator Agent for Web Navigation

Yuyu Guo; Wenjie Yang; Siyuan Yang; Ziyang Liu; Cheng Chen; Yuan Wei; Yun Hu; Yang Huang; Guoliang Hao; Dongsheng Yuan; Jianming Wang; Xin Chen; Hang Yu; Lei Lei; and Peng Di

arXiv:2602.13559·cs.AI·May 1, 2026

OpAgent: Operator Agent for Web Navigation

Yuyu Guo, Wenjie Yang, Siyuan Yang, Ziyang Liu, Cheng Chen, Yuan Wei, Yun Hu, Yang Huang, Guoliang Hao, Dongsheng Yuan, Jianming Wang, Xin Chen, Hang Yu, Lei Lei, and Peng Di

PDF

TL;DR

This paper introduces OpAgent, an online reinforcement learning web agent that interacts directly with websites, using hierarchical fine-tuning and a modular framework to achieve state-of-the-art success rates in web navigation tasks.

Contribution

It presents a novel online RL approach with a hybrid reward system and a modular operator framework, significantly improving web navigation performance over prior offline methods.

Findings

01

Achieved 38.1% success rate (pass@5) on WebArena, surpassing existing baselines.

02

Developed a hybrid reward mechanism combining WebJudge and RDT for better credit assignment.

03

Attained a 71.6% success rate with OpAgent, setting a new state-of-the-art in web navigation.

Abstract

To fulfill user instructions, autonomous web agents must contend with the inherent complexity and volatile nature of real-world websites. Conventional paradigms predominantly rely on Supervised Fine-Tuning (SFT) or Offline Reinforcement Learning (RL) using static datasets. However, these methods suffer from severe distributional shifts, as offline trajectories fail to capture the stochastic state transitions and real-time feedback of unconstrained wide web environments. In this paper, we propose a robust Online Reinforcement Learning WebAgent, designed to optimize its policy through direct, iterative interactions with unconstrained wide websites. Our approach comprises three core innovations: 1) Hierarchical Multi-Task Fine-tuning: We curate a comprehensive mixture of datasets categorized by functional primitives -- Planning, Acting, and Grounding -- establishing a Vision-Language Model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.