EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration
Runze Li, Yuwen Zhai, Bo Xu, LiWu Xu, Nian Shi, Wei Zhang, Ran Lin, Liang Wang

TL;DR
EchoTrail-GUI introduces a memory-augmented framework for GUI agents that autonomously learns from past successes to improve performance and generalization in GUI tasks.
Contribution
The paper presents a fully automated, three-stage framework enabling GUI agents to build, retrieve, and utilize structured memories for enhanced task performance.
Findings
Significant improvement in task success rates on Android benchmarks.
Automated knowledge base construction without human supervision.
Enhanced robustness and efficiency of GUI agents through structured memory.
Abstract
Contemporary GUI agents, while increasingly capable due to advances in Large Vision-Language Models (VLMs), often operate with a critical limitation: they treat each task in isolation, lacking a mechanism to systematically learn from past successes. This digital ''amnesia'' results in sub-optimal performance, repeated errors, and poor generalization to novel challenges. To bridge this gap, we introduce EchoTrail-GUI, a novel framework designed to mimic human-like experiential learning by equipping agents with a dynamic, accessible memory. Our framework operates in three distinct stages. First, during Experience Exploration, an agent autonomously interacts with GUI environments to build a curated database of successful task trajectories, validated by a reward model. Crucially, the entire knowledge base construction is thus fully automated, requiring no human supervision. Second, in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
