GUIDE: Graphical User Interface Data for Execution
Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

TL;DR
GUIDE is a comprehensive dataset designed to enhance multimodal large language models for robotic process automation across diverse websites and platforms, enabling more effective automation and understanding of graphical user interfaces.
Contribution
We introduce GUIDE, a new multi-platform GUI dataset with detailed annotations, and develop V-Zen, the first RPA model leveraging this dataset for multi-website automation.
Findings
GUIDE covers diverse websites and platforms.
V-Zen demonstrates effective multi-website automation.
Dataset facilitates research in GUI-based LLM applications.
Abstract
In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, the last action taken, CoT and the next action to be performed along with grounding information of where the action needs to be executed. The data is collected using our in-house advanced annotation tool NEXTAG (Next Action Grounding and Annotation Tool). The data is adapted for multiple OS, browsers and display types. It is collected by multiple annotators to capture the variation of design and the way person uses a website. Through this dataset, we aim to facilitate research and development…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Embedded Systems Design Techniques · Real-Time Systems Scheduling
