Graph based Environment Representation for Vision-and-Language   Navigation in Continuous Environments

Ting Wang; Zongkai Wu; Feiyu Yao; Donglin Wang

arXiv:2301.04352·cs.CV·January 12, 2023·1 cites

Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments

Ting Wang, Zongkai Wu, Feiyu Yao, Donglin Wang

PDF

Open Access

TL;DR

This paper introduces a graph-based environment representation for vision-and-language navigation in continuous environments, improving understanding and generalization by modeling semantic relationships with object detection and graph neural networks.

Contribution

The paper proposes a novel Environment Representation Graph (ERG) using object detection and GCNs, enhancing the relationship modeling between language and environment for VLN-CE tasks.

Findings

01

Achieves higher success rates on VLN-CE benchmarks

02

Improves cross-modal matching accuracy

03

Demonstrates strong generalization ability

Abstract

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment. The understanding of environments is a crucial part of the VLN-CE task, but existing methods are relatively simple and direct in understanding the environment, without delving into the relationship between language instructions and visual environments. Therefore, we propose a new environment representation in order to solve the above problems. First, we propose an Environment Representation Graph (ERG) through object detection to express the environment in semantic level. This operation enhances the relationship between language and environment. Then, the relational representations of object-object, object-agent in ERG are learned through GCN, so as to obtain a continuous expression about ERG. Sequentially, we combine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsGraph Convolutional Network