Executable Agentic Memory for GUI Agent
Zerui Qin, Sheng Yue, Xingyuan Hua, Yongjian Fu, Ju Ren

TL;DR
This paper introduces Executable Agentic Memory (EAM), a structured knowledge graph for GUI agents that improves robustness, efficiency, and long-horizon task performance over traditional model-centric methods.
Contribution
EAM shifts GUI planning from free-form generation to a retrieval-and-execution framework using a knowledge graph and value-guided graph search, with theoretical and empirical validation.
Findings
EAM outperforms state-of-the-art baselines like UI-TARS-7B by up to 19.6% on AndroidWorld.
EAM reduces token costs by 6 times compared to GPT-4o.
EAM achieves an average latency of 2.8 seconds, enabling reliable long-horizon GUI automation.
Abstract
Modern GUI agents typically rely on a model-centric and step-wise interaction paradigm, where LLMs must re-interpret the UI and re-decide actions at every screen, which is fragile in long-horizon tasks. In this paper, we propose Executable Agentic Memory (EAM), a structured Knowledge Graph (KG) that shifts GUI planning from free-form generation to a robust retrieval-and-execution process. Our approach includes a sample-efficient memory construction pipeline using state-aware DFS and action-group mining to compress multi-step routines. To ensure efficient planning, we introduce a value-guided graph search where a lightweight Q-function model steers Monte Carlo Tree Search (MCTS) over the KG. We theoretically establish bias-consistency for the Q-model and derive sample complexity bounds for path recovery. Empirically, EAM outperforms state-of-the-art baselines like UI-TARS-7B by up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
