PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

Jingxuan Wei; Xi Bai; Shan Liu; Caijun Jia; Zheng Sun; Xinglong Xu; Siyuan Li; Linzhuang Sun; Bihui Yu; Conghui He; Cheng Tan

arXiv:2605.15963·cs.AI·May 18, 2026

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

Jingxuan Wei, Xi Bai, Shan Liu, Caijun Jia, Zheng Sun, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Cheng Tan

PDF

1 Repo

TL;DR

This paper introduces PAGER, a geometry-aware agent that significantly improves point-precise GUI control by bridging the semantic-execution gap, enabling accurate, robust, and successful interactions in complex graphical interfaces.

Contribution

The paper presents PAGER, a novel topology-aware agent with a new benchmark PAGE Bench, and demonstrates substantial performance improvements over existing models in point-precise GUI tasks.

Findings

01

PAGER achieves over 62% step success rate, a 4.1x improvement over baselines.

02

PAGE Bench contains 4,906 problems with 224K pixel-level actions.

03

General models exceed 88% action accuracy but under 6% task success.

Abstract

Large vision-language models have significantly advanced GUI agents, enabling executable interaction across web, mobile, and desktop interfaces. Yet these gains largely rely on a forgiving region-tolerant paradigm, where many nearby pixels inside the same component remain valid. Precise geometric construction breaks this assumption: actions must land on points in continuous canvas space rather than tolerant regions. Because geometric primitives carry ontological dependencies, a local coordinate error can induce cascading topological failures that distort downstream objects and invalidate the final construction. We identify this regime as precision-sensitive GUI tasks, requiring point-level accuracy, geometry-aware verification, and robustness to dependency-driven error propagation. To benchmark it, we introduce PAGE Bench, with 4,906 problems and over 224K process-supervised,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openraiser/Pager
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.