Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era   of Foundation Models

Yue Zhang; Ziqiao Ma; Jialu Li; Yanyuan Qiao; Zun Wang; Joyce Chai; Qi; Wu; Mohit Bansal; Parisa Kordjamshidi

arXiv:2407.07035·cs.CL·December 31, 2024·3 cites

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi, Wu, Mohit Bansal, Parisa Kordjamshidi

PDF

Open Access 1 Repo

TL;DR

This survey reviews the current state and future prospects of Vision-and-Language Navigation, emphasizing how foundation models influence research challenges, methods, and opportunities in the field.

Contribution

It provides a structured overview of VLN research, highlighting the impact of foundation models and proposing a framework for embodied planning and reasoning.

Findings

01

Foundation models are transforming VLN research.

02

Current methods leverage foundation models for improved navigation.

03

Future opportunities include advanced reasoning and embodied planning.

Abstract

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangyuejoslin/VLN-Survey-with-Foundation-Models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReligious Tourism and Spaces · Geographic Information Systems Studies

MethodsSoftmax · Attention Is All You Need