TL;DR
AgentProg introduces a program-guided context management approach for long-horizon GUI agents, improving task performance and robustness by effectively organizing interaction history and handling partial observability.
Contribution
It presents a novel program-based context management method combined with a belief state mechanism, achieving state-of-the-art results in long-horizon GUI tasks.
Findings
Achieved state-of-the-art success rates on AndroidWorld and extended benchmarks.
Maintains robust performance on long-horizon tasks where baselines fail.
Effectively manages interaction history to prevent performance degradation.
Abstract
The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that reframes the interaction history as a program with variables and control flow. By organizing information according to the structure of program, this structure provides a principled mechanism to determine which information should be retained and which can be discarded. We further integrate a global belief state mechanism inspired by Belief MDP framework to handle partial observability and adapt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
