MUREL: Multimodal Relational Reasoning for Visual Question Answering

Remi Cadene; Hedi Ben-younes; Matthieu Cord; Nicolas Thome

arXiv:1902.09487·cs.CV·February 26, 2019·24 cites

MUREL: Multimodal Relational Reasoning for Visual Question Answering

Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome

PDF

Open Access 1 Repo

TL;DR

MuRel introduces a multimodal relational reasoning network that enhances visual question answering by modeling complex interactions and relations between image regions and questions, surpassing attention-based methods.

Contribution

The paper proposes MuRel, a novel end-to-end trainable relational network with a new reasoning primitive, improving over existing attention-based VQA models.

Findings

01

Outperforms attention-based models on VQA 2.0, VQA-CP v2, and TDIUC datasets.

02

The MuRel network achieves state-of-the-art or competitive results.

03

Ablation studies confirm the effectiveness of the relational reasoning approach.

Abstract

Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cadene/murel.bootstrap.pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques