Mem-W: Latent Memory-Native GUI Agents

Guibin Zhang; Yaohui Ling; Fanci Meng; Kun Wang; Shuicheng Yan

arXiv:2605.09317·cs.CL·May 12, 2026

Mem-W: Latent Memory-Native GUI Agents

Guibin Zhang, Yaohui Ling, Fanci Meng, Kun Wang, Shuicheng Yan

PDF

TL;DR

Mem-W introduces a novel approach to GUI agents by integrating memory directly into the continuous context, enhancing long-horizon task performance across web and mobile navigation benchmarks.

Contribution

It proposes a latent-memory-native architecture that treats memory as part of the continuous context, improving GUI agent effectiveness.

Findings

01

Mem-W improves performance on web and mobile navigation benchmarks.

02

It achieves up to +30.0 gains over baseline models.

03

Latent-context-native memory enhances long-horizon GUI agency.

Abstract

GUI agents are beginning to operate the web, mobile, and desktop as interactive worlds, where successful control depends on carrying forward visual, procedural, and task-level evidence beyond the fleeting present screen. Yet most agents still treat memory as an external, human-readable artifact: histories are summarized, categorized, retrieved, and reinserted as text or structured records before being encoded again by the policy. This creates a mismatch between the representational form in which experience is stored and the latent embedding sequence over which modern GUI policies actually act. We introduce Mem-W, a series of latent-memory-native GUI agents that treat memory as part of the agent's continuous context rather than as an auxiliary symbolic scaffold. Mem-W weaves both historical trajectories (as experiential memory) and in-session segments (as working memory) into compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.