World-Model-Augmented Web Agents with Action Correction
Zhouzhou Shen, Xueyu Hu, Xiyun Li, Tianqing Fang, Juncheng Li, Shengyu Zhang

TL;DR
This paper introduces WAC, a web agent framework that combines multi-agent collaboration, consequence simulation, and feedback-driven refinement to improve reasoning, risk-awareness, and task success in web automation tasks.
Contribution
WAC integrates model collaboration, environment simulation, and action correction to enhance web agent performance and safety, addressing limitations of prior single-model approaches.
Findings
Achieves 1.8% improvement on VisualWebArena
Achieves 1.3% improvement on Online-Mind2Web
Demonstrates effective risk-aware action correction
Abstract
Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning
