Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments
Sonia Raychaudhuri, Saim Wani, Shivansh Patel, Unnat Jain and, Angel X. Chang

TL;DR
This paper introduces a language-aligned supervision method and a new metric for evaluating how well agents follow natural language instructions in continuous 3D navigation environments, addressing off-path challenges.
Contribution
It proposes a novel supervision scheme aligned with language instructions and a metric to measure instruction-following accuracy in VLN tasks.
Findings
Improved alignment between supervision and instructions.
Enhanced evaluation of instruction-following capability.
Better handling of off-path navigation scenarios.
Abstract
In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent's location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
