Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future   Directions

Jing Gu; Eliana Stefani; Qi Wu; Jesse Thomason; Xin Eric Wang

arXiv:2203.12667·cs.CV·June 7, 2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

PDF

1 Repo

TL;DR

This survey reviews the current state of Vision-and-Language Navigation, discussing tasks, methods, challenges, and future directions to advance AI agents capable of understanding language and visual environments.

Contribution

It provides a comprehensive overview of VLN tasks, evaluation metrics, methods, and identifies key challenges and future research opportunities.

Findings

01

Current VLN methods face limitations in generalization.

02

Evaluation metrics vary across studies, affecting comparability.

03

Future directions include improving robustness and real-world applicability.

Abstract

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eric-ai-lab/awesome-vision-language-navigation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.