Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments
Ting Wang, Zongkai Wu, Feiyu Yao, Donglin Wang

TL;DR
This paper introduces a graph-based environment representation for vision-and-language navigation in continuous environments, improving understanding and generalization by modeling semantic relationships with object detection and graph neural networks.
Contribution
The paper proposes a novel Environment Representation Graph (ERG) using object detection and GCNs, enhancing the relationship modeling between language and environment for VLN-CE tasks.
Findings
Achieves higher success rates on VLN-CE benchmarks
Improves cross-modal matching accuracy
Demonstrates strong generalization ability
Abstract
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment. The understanding of environments is a crucial part of the VLN-CE task, but existing methods are relatively simple and direct in understanding the environment, without delving into the relationship between language instructions and visual environments. Therefore, we propose a new environment representation in order to solve the above problems. First, we propose an Environment Representation Graph (ERG) through object detection to express the environment in semantic level. This operation enhances the relationship between language and environment. Then, the relational representations of object-object, object-agent in ERG are learned through GCN, so as to obtain a continuous expression about ERG. Sequentially, we combine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsGraph Convolutional Network
