MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Weihua Cheng; Junming Liu; Yifei Sun; Botian Shi; Yirong Chen; Ding Wang

arXiv:2510.24168·cs.AI·April 15, 2026

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, Yirong Chen, Ding Wang

PDF

1 Repo

TL;DR

MGA introduces a memory-driven framework for GUI agents that decouples long-horizon tasks into independent steps, reducing complexity and improving efficiency in GUI automation.

Contribution

It proposes a minimalist, memory-based approach that enhances GUI agent performance by decoupling decision steps and eliminating redundant modules.

Findings

01

MGA achieves competitive performance on OSWorld and real-world GUI tasks.

02

The structured memory mechanism reduces system redundancy and cognitive overhead.

03

MGA maintains high efficiency with a simplified architecture.

Abstract

Multimodal Large Language Models (MLLMs) have significantly advanced GUI agents, yet long-horizon automation remains constrained by two critical bottlenecks: context overload from raw sequential trajectory dependence and architectural redundancy from over-engineered expert modules. Prevailing End-to-End and Multi-Agent paradigms struggle with error cascades caused by concatenated visual-textual histories and incur high inference latency due to redundant expert components, limiting their practical deployment. To address these issues, we propose the Memory-Driven GUI Agent (MGA), a minimalist framework that decouples long-horizon trajectories into independent decision steps linked by a structured state memory. MGA operates on an ``Observe First and Memory Enhancement`` principle, powered by two tightly coupled core mechanisms: (1) an Observer module that acts as a task-agnostic,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MintyCo0kie/MGA4OSWorld
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.