Mem-W: Latent Memory-Native GUI Agents
Guibin Zhang, Yaohui Ling, Fanci Meng, Kun Wang, Shuicheng Yan

TL;DR
Mem-W introduces a novel approach to GUI agents by integrating memory directly into the continuous context, enhancing long-horizon task performance across web and mobile navigation benchmarks.
Contribution
It proposes a latent-memory-native architecture that treats memory as part of the continuous context, improving GUI agent effectiveness.
Findings
Mem-W improves performance on web and mobile navigation benchmarks.
It achieves up to +30.0 gains over baseline models.
Latent-context-native memory enhances long-horizon GUI agency.
Abstract
GUI agents are beginning to operate the web, mobile, and desktop as interactive worlds, where successful control depends on carrying forward visual, procedural, and task-level evidence beyond the fleeting present screen. Yet most agents still treat memory as an external, human-readable artifact: histories are summarized, categorized, retrieved, and reinserted as text or structured records before being encoded again by the policy. This creates a mismatch between the representational form in which experience is stored and the latent embedding sequence over which modern GUI policies actually act. We introduce Mem-W, a series of latent-memory-native GUI agents that treat memory as part of the agent's continuous context rather than as an auxiliary symbolic scaffold. Mem-W weaves both historical trajectories (as experiential memory) and in-session segments (as working memory) into compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
