Language and Visual Entity Relationship Graph for Agent Navigation

Yicong Hong; Cristian Rodriguez-Opazo; Yuankai Qi; Qi Wu; Stephen; Gould

arXiv:2010.09304·cs.CV·December 29, 2020·20 cites

Language and Visual Entity Relationship Graph for Agent Navigation

Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen, Gould

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel graph-based approach that models relationships between language and visual entities to enhance agent navigation in real-world environments, significantly improving performance on benchmark datasets.

Contribution

It proposes a new Language and Visual Entity Relationship Graph and a message passing algorithm to better interpret complex instructions and environment perceptions in VLN tasks.

Findings

01

Achieves a new state-of-the-art SPL of 52% on R2R unseen split.

02

Improves SDTW from 13% to 34% on R4R dataset.

03

Demonstrates the effectiveness of relationship modeling in navigation accuracy.

Abstract

Vision-and-Language Navigation (VLN) requires an agent to navigate in a real-world environment following natural language instructions. From both the textual and visual perspectives, we find that the relationships among the scene, its objects,and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment. To capture and utilize the relationships, we propose a novel Language and Visual Entity Relationship Graph for modelling the inter-modal relationships between text and vision, and the intra-modal relationships among visual entities. We propose a message passing algorithm for propagating information between language elements and visual entities in the graph, which we then combine to determine the next action to take. Experiments show that by taking advantage of the relationships we are able to improve over state-of-the-art. On…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YicongHong/Entity-Graph-VLN
pytorchOfficial

Videos

Language and Visual Entity Relationship Graph for Agent Navigation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Time Series Analysis and Forecasting · Topic Modeling