Gated Multimodal Units for Information Fusion

John Arevalo; Thamar Solorio; Manuel Montes-y-G\'omez; Fabio A.; Gonz\'alez

arXiv:1702.01992·stat.ML·February 8, 2017·51 cites

Gated Multimodal Units for Information Fusion

John Arevalo, Thamar Solorio, Manuel Montes-y-G\'omez, Fabio A., Gonz\'alez

PDF

Open Access 5 Repos

TL;DR

This paper introduces the Gated Multimodal Unit (GMU), a neural network component that effectively combines multiple data modalities, demonstrated on movie genre classification, outperforming existing fusion methods.

Contribution

The paper proposes the GMU model for multimodal data fusion and introduces the large MM-IMDb dataset for movie genre prediction.

Findings

01

GMU improves macro F-score over single-modality models

02

GMU outperforms other fusion strategies including mixture of experts

03

The MM-IMDb dataset is the largest publicly available multimodal dataset for this task

Abstract

This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks