SE-GA: Memory-Augmented Self-Evolution for GUI Agents

Shilong Jin; Lanjun Wang; and Zhuosheng Zhang

arXiv:2605.16883·cs.LG·May 19, 2026

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

Shilong Jin, Lanjun Wang, and Zhuosheng Zhang

PDF

1 Repo

TL;DR

SE-GA introduces a memory-augmented, self-evolving framework for GUI agents that enhances multi-step task performance through dynamic memory retrieval and iterative self-improvement.

Contribution

It presents the novel SE-GA framework combining hierarchical memory and self-evolution, enabling GUI agents to adapt and improve in dynamic environments.

Findings

01

Achieves 89.0% success on ScreenSpot benchmark.

02

Attains 75.8% success on AndroidControl-High dataset.

03

Demonstrates superior generalization on AndroidWorld benchmark.

Abstract

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinshilong-dev/SE-GA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.