Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language   Navigation in Continuous Environments

Sonia Raychaudhuri; Saim Wani; Shivansh Patel; Unnat Jain and; Angel X. Chang

arXiv:2109.15207·cs.CV·October 1, 2021

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

Sonia Raychaudhuri, Saim Wani, Shivansh Patel, Unnat Jain and, Angel X. Chang

PDF

Open Access

TL;DR

This paper introduces a language-aligned supervision method and a new metric for evaluating how well agents follow natural language instructions in continuous 3D navigation environments, addressing off-path challenges.

Contribution

It proposes a novel supervision scheme aligned with language instructions and a metric to measure instruction-following accuracy in VLN tasks.

Findings

01

Improved alignment between supervision and instructions.

02

Enhanced evaluation of instruction-following capability.

03

Better handling of off-path navigation scenarios.

Abstract

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent's location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling