LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
Yubin Wu, Zicheng Cai, Liping Ning, Hua Wang, Zhi Chen, Yaohua Tang, Hao Chen

TL;DR
This paper introduces a novel training paradigm for lightweight GUI agents that leverages knowledge distillation and dual-level exploration to improve performance without overfitting.
Contribution
It presents a SFT-free training method with guided on-policy distillation and a dual-level framework, advancing lightweight GUI agent capabilities.
Findings
Achieves state-of-the-art results among small-scale models.
Outperforms traditional imitation learning with 2B/3B scale agents.
Enhances exploration and reduces hallucinations in GUI tasks.
Abstract
Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently needed. Traditional Supervised Fine-Tuning (SFT) for small-scale models often leads to overfitting, catastrophic forgetting and policy rigidity, and thus fails to fully address these challenges. In this work, we propose a novel SFT-free training paradigm that significantly enhances the performance of small-scale models. We first present the initial systematic integration of generalized knowledge distillation into the GUI agent domain via Guided On-policy Distillation. By incorporating oracle reference trajectories together with a dynamic retrieval mechanism, our method reduces hallucinations and mitigates the cognitive misalignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
