Visual Relationship Detection with Visual-Linguistic Knowledge from   Multimodal Representations

Meng-Jiun Chiou; Roger Zimmermann; Jiashi Feng

arXiv:2009.04965·cs.CV·April 6, 2021

Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations

Meng-Jiun Chiou, Roger Zimmermann, Jiashi Feng

PDF

1 Repo

TL;DR

This paper introduces RVL-BERT, a multimodal transformer model that leverages visual and linguistic commonsense knowledge for improved visual relationship detection, with modules capturing spatial info and decoupling detection from recognition.

Contribution

It proposes a novel multimodal transformer architecture with spatial and mask attention modules, enabling effective visual relationship reasoning using external knowledge.

Findings

01

Achieves competitive results on challenging datasets.

02

Effectively incorporates visual-linguistic commonsense knowledge.

03

Decouples object detection from relationship recognition.

Abstract

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanisms, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coldmanck/RVL-BERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.