Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo, Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, Jinyoung Yeo

TL;DR
This paper introduces a world-model-augmented web agent that simulates environment outcomes to improve decision-making in web navigation tasks, addressing limitations of current LLM-based agents.
Contribution
The study develops a novel transition-focused observation abstraction and demonstrates its effectiveness in enhancing web agent performance without additional training.
Findings
World models improve decision-making in web navigation.
The proposed approach is cost- and time-efficient.
Agents outperform recent tree-search-based methods.
Abstract
Large language models (LLMs) have recently gained much attention in building autonomous agents. However, the performance of current LLM-based web agents in long-horizon tasks is far from optimal, often yielding errors such as repeatedly buying a non-refundable flight ticket. By contrast, humans can avoid such an irreversible mistake, as we have an awareness of the potential outcomes (e.g., losing money) of our actions, also known as the "world model". Motivated by this, our study first starts with preliminary analyses, confirming the absence of world models in current LLMs (e.g., GPT-4o, Claude-3.5-Sonnet, etc.). Then, we present a World-model-augmented (WMA) web agent, which simulates the outcomes of its actions for better decision-making. To overcome the challenges in training LLMs as world models predicting next observations, such as repeated elements across observations and long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management
MethodsSoftmax · Attention Is All You Need
