Region4Web: Rethinking Observation Space Granularity for Web Agents
Donguk Kwon, Dongha Lee

TL;DR
This paper introduces Region4Web, a framework that reorganizes web page observations into functional regions to improve web agent understanding and performance.
Contribution
It proposes a novel hierarchical decomposition of web pages into functional regions and a compact inference pipeline for region-level observations.
Findings
Region4Web reduces observation length significantly.
PageDigest improves task success rate across various models.
Operating at region granularity enhances agent understanding.
Abstract
Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-level signals at every step. We argue observation should instead operate at the granularity of functional regions, parts of the page that each serve a distinct purpose. We propose Region4Web, a framework that reorganizes the AXTree into functional regions through hierarchical decomposition and semantic abstraction, exposing the page's functional organization as the basis for page state understanding. Moreover, we propose PageDigest, a web-specific inference pipeline that delivers this region-level observation to the actor agent as a compact per-page digest that persists across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
