Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal; Gregory J. Stein; Ziyu Yao

arXiv:2603.14248·cs.AI·April 29, 2026

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

PDF

TL;DR

This paper introduces a hierarchical planning framework to analyze LLM web agents, revealing that low-level execution and grounding are key bottlenecks for improving reliability in long-horizon tasks.

Contribution

It presents a structured analysis method that distinguishes between planning, grounding, and recovery, highlighting the importance of low-level execution improvements.

Findings

01

Structured PDDL plans outperform natural language plans in goal-directedness.

02

Low-level execution is the primary bottleneck in web agent performance.

03

Enhancing perceptual grounding and adaptive control is crucial for reliability.

Abstract

Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze web agents across three layers (i.e., high-level planning, low-level execution, and replanning), enabling process-based evaluation of reasoning, grounding, and recovery. Our experiments show that structured Planning Domain Definition Language (PDDL) plans produce more concise and goal-directed strategies than natural language (NL) plans, but low-level execution remains the dominant bottleneck. These results indicate that improving perceptual grounding and adaptive control, not only high-level reasoning, is critical for achieving human-level reliability. This hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.