Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun, Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan

TL;DR
Ferret-UI 2 is a multimodal large language model designed to understand user interfaces across diverse platforms, leveraging innovations in platform support, high-resolution perception, and advanced data generation to outperform previous models.
Contribution
Ferret-UI 2 introduces support for multiple platforms, adaptive high-resolution perception, and GPT-4o powered task data generation, advancing universal UI understanding across diverse devices.
Findings
Outperforms Ferret-UI on multiple UI understanding tasks
Demonstrates strong cross-platform transfer capabilities
Excels in complex user-centered interaction tasks
Abstract
Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task training data generation powered by GPT-4o with set-of-mark visual prompting. These advancements enable Ferret-UI 2 to perform complex, user-centered interactions, making it highly versatile and adaptable for the expanding diversity of platform ecosystems. Extensive empirical experiments on referring,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUsability and User Interface Design
