GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Tao Liu; Chongyu Wang; Rongjie Li; Yingchen Yu; Xuming He; Bai Song

arXiv:2510.27210·cs.AI·November 3, 2025

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Tao Liu, Chongyu Wang, Rongjie Li, Yingchen Yu, Xuming He, Bai Song

PDF

Open Access 1 Datasets

TL;DR

GUI-Rise introduces a structured reasoning framework with history summarization to improve GUI navigation by enhancing generalization and decision-making, achieving state-of-the-art results especially in out-of-domain tasks.

Contribution

The paper presents a novel reasoning-enhanced framework and GUI-Rise agent that integrate structured reasoning, history summarization, and reinforcement learning for improved GUI navigation.

Findings

01

State-of-the-art performance on standard benchmarks.

02

Strong out-of-domain generalization.

03

Effective history-aware action prediction.

Abstract

While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, \textbf{GUI-Rise}, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Leon022/GUI-Rise-pseudo-label
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems