# KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation

**Authors:** Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen, Thanh-Toan Nguyen, Pengfei Xian, Wenao Ma, Shengchao Qin, Graziano Chesi, Ngai Wong

arXiv: 2509.00366 · 2025-09-03

## TL;DR

KG-RAG enhances GUI agent decision-making by transforming UI Transition Graphs into knowledge graphs for efficient retrieval, leading to higher success rates and decision accuracy in mobile app tasks.

## Contribution

Introduces KG-RAG, a novel framework that converts UTGs into structured vector databases for improved real-time retrieval and decision-making in GUI agents.

## Key findings

- Achieves 75.8% success rate, 8.9% improvement over AutoDroid.
- Reaches 84.6% decision accuracy, 8.1% higher than previous methods.
- Reduces average task steps from 4.5 to 4.1.

## Abstract

Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. By leveraging an intent-guided LLM search method, KG-RAG generates actionable navigation paths, enhancing agent decision-making. Experiments across diverse mobile apps show that KG-RAG outperforms existing methods, achieving a 75.8% success rate (8.9% improvement over AutoDroid), 84.6% decision accuracy (8.1% improvement), and reducing average task steps from 4.5 to 4.1. Additionally, we present KG-Android-Bench and KG-Harmony-Bench, two benchmarks tailored to the Chinese mobile ecosystem for future research. Finally, KG-RAG transfers to web/desktop (+40% SR on Weibo-web; +20% on QQ Music-desktop), and a UTG cost ablation shows accuracy saturates at ~4h per complex app, enabling practical deployment trade-offs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00366/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00366/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/2509.00366/full.md

---
Source: https://tomesphere.com/paper/2509.00366