AerialVLN: Vision-and-Language Navigation for UAVs

Shubo Liu; Hongsheng Zhang; Yuankai Qi; Peng Wang; Yaning; Zhang; Qi Wu

arXiv:2308.06735·cs.CV·August 15, 2023·1 cites

AerialVLN: Vision-and-Language Navigation for UAVs

Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yaning, Zhang, Qi Wu

PDF

Open Access 1 Repo

TL;DR

AerialVLN introduces a new UAV-based vision-and-language navigation task in outdoor environments, supported by a realistic 3D simulator, highlighting the complexity of aerial navigation and the gap between current models and human performance.

Contribution

The paper proposes AerialVLN, a novel UAV-based VLN task with a realistic 3D simulator, and extends baseline models to address aerial navigation challenges.

Findings

01

Baseline models lag behind human performance.

02

AerialVLN presents a more complex navigation environment.

03

The dataset and simulator facilitate future research.

Abstract

Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn significant attention in both computer vision and natural language processing communities. Existing VLN tasks are built for agents that navigate on the ground, either indoors or outdoors. However, many tasks require intelligent agents to carry out in the sky, such as UAV-based goods delivery, traffic/security patrol, and scenery tour, to name a few. Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios. Our simulator supports continuous navigation, environment extension and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

airvln/airvln
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning