Robust and Interpretable Grounding of Spatial References with Relation   Networks

Tsung-Yen Yang; Andrew S. Lan; Karthik Narasimhan

arXiv:2005.00696·cs.CL·October 8, 2020·1 cites

Robust and Interpretable Grounding of Spatial References with Relation Networks

Tsung-Yen Yang, Andrew S. Lan, Karthik Narasimhan

PDF

Open Access

TL;DR

This paper introduces a relation network model for understanding spatial references in natural language, enhancing robustness and interpretability in tasks like navigation and manipulation.

Contribution

It proposes a dynamic, text-conditioned relation network with cross-modal attention for explicit reasoning over spatial entities, improving robustness and interpretability.

Findings

01

17% improvement in goal location prediction

02

15% enhancement in robustness over state-of-the-art

03

Effective in three diverse spatial understanding tasks

Abstract

Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations for spatial concepts. However, the lack of explicit reasoning over entities makes such approaches vulnerable to noise in input text or state observations. In this paper, we develop effective models for understanding spatial references in text that are robust and interpretable, without sacrificing performance. We design a text-conditioned \textit{relation network} whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. This design choice provides interpretability of learned intermediate outputs. Experiments across three tasks demonstrate that our model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies

MethodsInterpretability