OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang,, Tianqing Fang, Zhenzhong Lan, Dong Yu

TL;DR
This paper introduces OpenWebVoyager, an open-source framework for developing multimodal web agents that autonomously explore, learn from feedback, and improve their web navigation capabilities through iterative cycles.
Contribution
It presents a novel iterative exploration-feedback-optimization framework for multimodal web agents, enabling autonomous self-improvement in real-world web navigation tasks.
Findings
Agents improve performance after each iteration
Strong results across multiple test sets
Effective multimodal perception and navigation
Abstract
The rapid development of large language and multimodal models has sparked significant interest in using proprietary models, such as GPT-4o, to develop autonomous agents capable of handling real-world scenarios like web navigation. Although recent open-source efforts have tried to equip agents with the ability to explore environments and continuously improve over time, they are building text-only agents in synthetic environments where the reward signals are clearly defined. Such agents struggle to generalize to realistic settings that require multimodal perception abilities and lack ground-truth signals. In this paper, we introduce an open-source framework designed to facilitate the development of multimodal web agent that can autonomously conduct real-world exploration and improve itself. We first train the base model with imitation learning to gain the basic abilities. We then let the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multi-Agent Systems and Negotiation · Semantic Web and Ontologies
MethodsBalanced Selection
