# Neighbourhood Watch: Referring Expression Comprehension via   Language-guided Graph Attention Networks

**Authors:** Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton van den, Hengel

arXiv: 1812.04794 · 2018-12-13

## TL;DR

This paper introduces a graph-based, language-guided attention mechanism for referring expression comprehension, improving object localization by explicitly modeling inter-object relationships and properties, and providing explainability.

## Contribution

It proposes a novel graph attention network that captures object relationships guided by language, enhancing localization accuracy and interpretability.

## Key findings

- Outperforms existing methods on three datasets.
- Provides visualizable and explainable comprehension decisions.
- Effectively models object relationships and properties.

## Abstract

The task in referring expression comprehension is to localise the object instance in an image described by a referring expression phrased in natural language. As a language-to-vision matching task, the key to this problem is to learn a discriminative object feature that can adapt to the expression used. To avoid ambiguity, the expression normally tends to describe not only the properties of the referent itself, but also its relationships to its neighbourhood. To capture and exploit this important information we propose a graph-based, language-guided attention mechanism. Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches. Furthermore, the proposed graph attention mechanism enables the comprehension decision to be visualisable and explainable. Experiments on three referring expression comprehension datasets show the advantage of the proposed approach.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.04794/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1812.04794/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1812.04794/full.md

---
Source: https://tomesphere.com/paper/1812.04794