MMInA: Benchmarking Multihop Multimodal Internet Agents
Shulin Tian, Ziniu Zhang, Liangyu Chen, Ziwei Liu

TL;DR
MMInA introduces a realistic, evolving benchmark for evaluating multimodal, multihop web agents on complex, real-world tasks, revealing current limitations and proposing memory augmentation to improve performance.
Contribution
The paper presents MMInA, a novel benchmark with real-world evolving websites and multihop tasks, along with a memory augmentation method to enhance agent performance.
Findings
Multihop web tasks are easy for humans but challenging for state-of-the-art agents.
Agents tend to fail early in multihop tasks, reducing success rates.
Memory replay significantly improves agent performance on multihop tasks.
Abstract
Autonomous embodied agents live on an Internet of multimedia websites. Can they hop around multimodal websites to complete complex user tasks? Existing benchmarks fail to assess them in a realistic, evolving environment for their embodiment across websites. To answer this question, we present MMInA, a multihop and multimodal benchmark to evaluate the embodied agents for compositional Internet tasks, with several appealing properties: 1) Evolving real-world multimodal websites. Our benchmark uniquely operates on evolving real-world websites, ensuring a high degree of realism and applicability to natural user tasks. Our data includes 1,050 human-written tasks covering various domains such as shopping and travel, with each task requiring the agent to extract multimodal information from web pages as observations autonomously; 2) Multihop web browsing. Our dataset features naturally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
