VLN-Trans: Translator for the Vision and Language Navigation Agent
Yue Zhang, Parisa Kordjamshidi

TL;DR
This paper introduces VLN-Trans, a translator module that converts navigation instructions into simplified, environment-specific sub-instructions, improving the agent's ability to follow complex language commands in vision-and-language navigation tasks.
Contribution
The paper proposes a novel translator module and a synthetic dataset to enhance instruction understanding for navigation agents, addressing landmark recognition and distinctiveness issues.
Findings
Achieves state-of-the-art results on R2R, R4R, and R2R-Last datasets.
Improves landmark recognition and instruction following accuracy.
Enhances navigation performance by focusing on recognizable and distinctive landmarks.
Abstract
Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints. To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent's visual abilities and the observed visual environment. To achieve this goal, we create a new synthetic sub-instruction dataset and design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
