TL;DR
This survey reviews the current state of Vision-and-Language Navigation, discussing tasks, methods, challenges, and future directions to advance AI agents capable of understanding language and visual environments.
Contribution
It provides a comprehensive overview of VLN tasks, evaluation metrics, methods, and identifies key challenges and future research opportunities.
Findings
Current VLN methods face limitations in generalization.
Evaluation metrics vary across studies, affecting comparability.
Future directions include improving robustness and real-world applicability.
Abstract
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
