# Deeply Supervised Multimodal Attentional Translation Embeddings for   Visual Relationship Detection

**Authors:** Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia, Zlatintsi, Petros Maragos

arXiv: 1902.05829 · 2019-02-18

## TL;DR

This paper introduces a novel deeply supervised multimodal attentional architecture for visual relationship detection, leveraging spatio-linguistic similarities to improve triplet detection accuracy.

## Contribution

It proposes a new two-branch architecture with multimodal attention that outperforms existing methods on standard datasets.

## Key findings

- Outperforms all compared methods on VRD dataset
- Effectively exploits spatio-linguistic similarities in low-dimensional space
- Provides both quantitative and qualitative validation

## Abstract

Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space. We present a variety of experiments comparing against all related approaches in the literature, as well as by re-implementing and fine-tuning several of them. Results on the commonly employed VRD dataset [1] show that the proposed method clearly outperforms all others, while we also justify our claims both quantitatively and qualitatively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.05829/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.05829/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1902.05829/full.md

---
Source: https://tomesphere.com/paper/1902.05829