WebVLN: Vision-and-Language Navigation on Websites

Qi Chen; Dileepa Pitawela; Chongyang Zhao; Gengze Zhou; Hsiang-Ting; Chen; Qi Wu

arXiv:2312.15820·cs.CV·December 27, 2023·1 cites

WebVLN: Vision-and-Language Navigation on Websites

Qi Chen, Dileepa Pitawela, Chongyang Zhao, Gengze Zhou, Hsiang-Ting, Chen, Qi Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces WebVLN, a novel vision-and-language navigation task on websites, along with a dataset and a specialized network, enabling AI agents to navigate web content using question-based instructions and web-specific information.

Contribution

The paper presents a new WebVLN task, a dataset WebVLN-v1, and a novel WebVLN-Net model that incorporates web-specific content for improved navigation performance.

Findings

01

WebVLN-Net outperforms existing VLN and web navigation methods.

02

The WebVLN dataset enables research on web-based navigation tasks.

03

Incorporating HTML content improves navigation accuracy.

Abstract

Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations. We recognise a promising opportunity to extend VLN to a comparable navigation task that holds substantial significance in our daily lives, albeit within the virtual realm: navigating websites on the Internet. This paper proposes a new task named Vision-and-Language Navigation on Websites (WebVLN), where we use question-based instructions to train an agent, emulating how users naturally browse websites. Unlike the existing VLN task that only pays attention to vision and instruction (language), the WebVLN agent further considers underlying web-specific content like HTML, which could not be seen on the rendered web pages yet contains rich visual and textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

webvln/webvln
pytorchOfficial

Videos

WebVLN: Vision-and-Language Navigation on Websites· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling