Mind2Web: Towards a Generalist Agent for the Web
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi, Wang, Huan Sun, Yu Su

TL;DR
Mind2Web introduces a comprehensive dataset for developing generalist web agents capable of following natural language instructions across diverse real-world websites, enabling progress towards more adaptable and capable web automation agents.
Contribution
The paper presents Mind2Web, the first large-scale, diverse dataset of real-world web tasks and actions, and explores using large language models with filtering techniques to build generalist web agents.
Findings
Filtering website HTML improves LLM performance.
Models show promise on unseen websites and domains.
Substantial room for improvement remains.
Abstract
We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · AI in Service Interactions
