Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Yan Zhang; Daiqing Wu; Huawen Shen; Can Ma; Yu Zhou

arXiv:2605.00642·cs.AI·May 12, 2026

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Yan Zhang, Daiqing Wu, Huawen Shen, Can Ma, Yu Zhou

PDF

1 Repo

TL;DR

This paper introduces GUI-SD, a novel on-policy self-distillation framework for GUI grounding that improves accuracy and efficiency by leveraging a visually enriched context and entropy-guided token weighting.

Contribution

It is the first OPSD framework specifically designed for GUI grounding, enhancing guidance and focus during training compared to prior reinforcement learning methods.

Findings

01

GUI-SD outperforms GRPO-based methods in accuracy.

02

GUI-SD is more training-efficient.

03

GUI-SD achieves consistent improvements across six benchmarks.

Abstract

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zhangyan-ucas.github.io/GUI-SD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.