TL;DR
This paper introduces TransBench, a comprehensive benchmark for evaluating and improving the transferability of GUI agents across different platforms, versions, and applications, addressing key challenges in dynamic digital environments.
Contribution
We present TransBench, the first benchmark to systematically assess and enhance GUI agent transferability across multiple dimensions and diverse app categories.
Findings
Significant improvements in grounding accuracy with our methods.
TransBench effectively evaluates cross-version, cross-platform, and cross-application transferability.
Our code and data are publicly available for further research.
Abstract
Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt to the dynamic and interconnected nature of real-world digital environments, where tasks frequently span multiple platforms and applications while also being impacted by version updates. To address this, we introduce TransBench, the first benchmark designed to systematically evaluate and enhance the transferability of GUI agents across three key dimensions: cross-version transferability (adapting to version updates), cross-platform transferability (generalizing across platforms like iOS,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
