Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation
Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira

TL;DR
This paper introduces structure-encoding auxiliary tasks to pre-train image encoders using navigation environment data, significantly enhancing visual representations for improved performance in Vision-and-Language Navigation tasks.
Contribution
It proposes novel auxiliary tasks for pre-training image encoders with environment data, addressing the distribution shift issue in VLN and improving navigation success rates.
Findings
SEA pre-trained features encode scene structure better.
Improved success rates on Test-Unseen environments.
Plug-and-play with existing VLN agents without tuning.
Abstract
In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on. However, the distribution shift between the training images from ImageNet and the views in the navigation environments may render the ImageNet pre-trained image encoder suboptimal. Therefore, in this paper, we design a set of structure-encoding auxiliary tasks (SEA) that leverage the data in the navigation environments to pre-train and improve the image encoder. Specifically, we design and customize (1) 3D jigsaw, (2) traversability prediction, and (3) instance classification to pre-train the image encoder. Through rigorous ablations, our SEA pre-trained features are shown to better encode structural information of the scenes, which ImageNet pre-trained features fail to properly encode but is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
Methodsfail
