GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Tao Liu, Chongyu Wang, Rongjie Li, Yingchen Yu, Xuming He, Bai Song

TL;DR
GUI-Rise introduces a structured reasoning framework with history summarization to improve GUI navigation by enhancing generalization and decision-making, achieving state-of-the-art results especially in out-of-domain tasks.
Contribution
The paper presents a novel reasoning-enhanced framework and GUI-Rise agent that integrate structured reasoning, history summarization, and reinforcement learning for improved GUI navigation.
Findings
State-of-the-art performance on standard benchmarks.
Strong out-of-domain generalization.
Effective history-aware action prediction.
Abstract
While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, \textbf{GUI-Rise}, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
