GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Rui Yang; Qianhui Wu; Zhaoyang Wang; Hanyang Chen; Ke Yang; Hao Cheng; Huaxiu Yao; Baoling Peng; Huan Zhang; Jianfeng Gao; Tong Zhang

arXiv:2602.22190·cs.LG·February 26, 2026

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Rui Yang, Qianhui Wu, Zhaoyang Wang, Hanyang Chen, Ke Yang, Hao Cheng, Huaxiu Yao, Baoling Peng, Huan Zhang, Jianfeng Gao, Tong Zhang

PDF

Open Access

TL;DR

GUI-Libra introduces a specialized training approach for native GUI agents, leveraging curated reasoning data, action-aware supervised fine-tuning, and improved RLVR techniques to enhance long-horizon navigation performance.

Contribution

The paper presents a novel training recipe for GUI agents, including a curated reasoning dataset, action-aware SFT, and stabilized RLVR with KL regularization, addressing key challenges in the field.

Findings

01

Significant improvements in step-wise accuracy and task completion across benchmarks.

02

Curated 81K GUI reasoning dataset enhances reasoning capabilities.

03

KL regularization in RLVR stabilizes training and improves online predictability.

Abstract

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where multiple actions can be correct but only a single demonstrated action is used for verification. This makes offline step-wise metrics weak predictors of online task success. In this work, we present GUI-Libra, a tailored training recipe that addresses these challenges. First, to mitigate the scarcity of action-aligned reasoning data, we introduce a data construction and filtering pipeline and release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling