Hierarchical Cross-Modal Agent for Robotics Vision-and-Language   Navigation

Muhammad Zubair Irshad; Chih-Yao Ma; Zsolt Kira

arXiv:2104.10674·cs.RO·April 22, 2021·1 cites

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira

PDF

Open Access 1 Repo

TL;DR

This paper introduces Robo-VLN, a more realistic continuous environment for vision-and-language navigation, and proposes a hierarchical agent that significantly improves navigation performance over existing methods.

Contribution

It presents Robo-VLN, a new continuous 3D environment for VLN, and develops a hierarchical agent with layered decision-making that outperforms prior approaches.

Findings

01

Hierarchical agent outperforms baselines in Robo-VLN

02

Layered decision-making improves navigation accuracy

03

Modular training enhances policy effectiveness

Abstract

Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-Language Navigation (VLN). This task requires the agent to navigate to a goal purely based on visual sensory inputs given natural language instructions. However, prior works formulate the problem as a navigation graph with a discrete action space. In this work, we lift the agent off the navigation graph and propose a more complex VLN setting in continuous 3D reconstructed environments. Our proposed setting, Robo-VLN, more closely mimics the challenges of real world navigation. Robo-VLN tasks have longer trajectory lengths, continuous action spaces, and challenges such as obstacles. We provide a suite of baselines inspired by state-of-the-art works in discrete VLN and show that they are less effective at this task. We further propose that decomposing the task into specialized high- and low-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GT-RIPL/robo-vln
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications