GAIR: GUI Automation via Information-Joint Reasoning and Group Reflection
Zishu Wei, Qixiang Ma, Xavier Hu, Yuhang Liu, Hui Zang, Yudong Zhao, Tao Wang, Shengyu Zhang, Fei Wu

TL;DR
GAIR introduces a novel multi-model framework for GUI automation that integrates heterogeneous models through joint reasoning and reflection, significantly improving performance across diverse GUI tasks.
Contribution
The paper proposes GAIR, a framework that combines multiple GUI-specific models with a general-purpose model for enhanced automation performance.
Findings
GAIR outperforms existing methods on GUI benchmarks.
The group reflection mechanism improves decision accuracy.
Joint reasoning enhances model collaboration and task handling.
Abstract
Building AI systems for GUI automation task has attracted remarkable research efforts, where MLLMs are leveraged for processing user requirements and give operations. However, GUI automation includes a wide range of tasks, from document processing to online shopping, from CAD to video editing. Diversity between particular tasks requires MLLMs for GUI automation to have heterogeneous capabilities and master multidimensional expertise, raising problems on constructing such a model. To address such challenge, we propose GAIR: GUI Automation via Information-Joint Reasoning and Group Reflection, a novel MLLM-based GUI automation agent framework designed for integrating knowledge and combining capabilities from heterogeneous models to build GUI automation agent systems with higher performance. Since different GUI-specific MLLMs are trained on different dataset and thus have different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · AI-based Problem Solving and Planning · Persona Design and Applications
