Aerial Vision-and-Dialog Navigation

Yue Fan; Winson Chen; Tongzhou Jiang; Chun Zhou; Yi Zhang; Xin Eric; Wang

arXiv:2205.12219·cs.CV·June 2, 2023

Aerial Vision-and-Dialog Navigation

Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric, Wang

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces Aerial Vision-and-Dialog Navigation (AVDN), enabling drones to follow natural language commands through a new dataset and a transformer-based model that predicts navigation and human attention.

Contribution

It presents a new dataset for aerial navigation via dialogue and a novel transformer model that incorporates human attention to improve navigation accuracy.

Findings

01

The AVDN dataset contains over 3,000 navigation trajectories with dialogs.

02

The HAA-Transformer effectively predicts navigation waypoints and human attention.

03

Results show improved navigation performance with attention prediction.

Abstract

The ability to converse with humans and follow natural language commands is crucial for intelligent unmanned aerial vehicles (a.k.a. drones). It can relieve people's burden of holding a controller all the time, allow multitasking, and make drone control more accessible for people with disabilities or with their hands occupied. To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation. We build a drone simulator with a continuous photorealistic environment and collect a new AVDN dataset of over 3k recorded navigation trajectories with asynchronous human-human dialogs between commanders and followers. The commander provides initial navigation instruction and further guidance by request, while the follower navigates the drone in the simulator and asks questions when needed. During data collection, followers' attention on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

yfan1997/AVDN
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Human Pose and Action Recognition