Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen, Chang Liu, Gengluo Li, Xinlong Wang, Yu Zhou, Can Ma,, Xiangyang Ji

TL;DR
Falcon-UI is a large-scale GUI agent model that learns GUI understanding from an instruction-free dataset and achieves high accuracy in GUI navigation tasks, emphasizing the importance of context comprehension.
Contribution
The paper introduces Insight-UI, an instruction-free GUI dataset, and presents Falcon-UI, a model pretrained on this dataset and fine-tuned for GUI understanding, advancing GUI agent capabilities.
Findings
Falcon-UI achieves accuracy comparable to larger models.
Insight-UI dataset covers diverse platforms and resolutions.
Pretraining on Insight-UI improves GUI context understanding.
Abstract
Pursuing human-like interaction for Graphical User Interface (GUI) agents requires understanding the GUI context and following user instructions. However, existing works typically couple these two aspects and focus more on instruct-following abilities, while ignoring the importance of understanding the GUI context. In this paper, we introduce an instruction-free GUI navigation dataset, termed Insight-UI Dataset, to enhance model comprehension of GUI environments. Insight-UI Dataset is automatically generated from the Common Crawl corpus, simulating various platforms -- including iOS, Android, Windows, and Linux -- across multiple resolutions on 312K domains. Although GUI interactions vary by context, diverse interfaces share common internal patterns, such as clicking an item to view its details. It implies the feasibility of independent GUI operation learning, followed by joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVirtual Reality Applications and Impacts · Data Visualization and Analytics · Usability and User Interface Design
MethodsFocus
