Web Agents Should Adopt the Plan-Then-Execute Paradigm
Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner

TL;DR
The paper advocates for a shift from the ReAct paradigm to a plan-then-execute approach for web agents, emphasizing pre-defined task-specific programs to improve security and reliability.
Contribution
It introduces the plan-then-execute paradigm for web agents, highlighting its advantages over ReAct and analyzing the challenges in implementing it effectively.
Findings
All WebArena tasks are compatible with plan-then-execute.
80% of tasks can be completed with a purely programmatic plan without runtime LLM.
Typed interfaces for website interactions are essential for effective planning.
Abstract
ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reason is that web content mixes inputs from many parties. An e-commerce product page may combine a seller's listing, customer reviews and sponsored advertisements. Under ReAct, all of this content flows into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted data may influence values or branches inside a predefined execution graph, but it cannot redefine the user task or cause the model to synthesize new actions at runtime. We analyze WebArena, a popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
