MMTM: Multimodal Transfer Module for CNN Fusion

Hamid Reza Vaezi Joze; Amirreza Shaban; Michael L. Iuzzolino and; Kazuhito Koishida

arXiv:1911.08670·cs.CV·April 1, 2020

MMTM: Multimodal Transfer Module for CNN Fusion

Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino and, Kazuhito Koishida

PDF

1 Repo 2 Videos

TL;DR

This paper introduces MMTM, a simple yet effective neural module for multimodal feature fusion in CNNs, improving recognition accuracy across various multimodal tasks with minimal architectural changes.

Contribution

The paper proposes the Multimodal Transfer Module (MMTM), enabling slow, flexible fusion of multiple modalities within CNNs while allowing easy integration with pretrained models.

Findings

01

Improves recognition accuracy on multiple datasets.

02

Achieves state-of-the-art or competitive results.

03

Facilitates multimodal fusion with minimal architectural modifications.

Abstract

In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end. Due to its simplicity late fusion is still the predominant approach in many state-of-the-art multimodal applications. In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks. The propose unit, named Multimodal Transfer Module (MMTM), can be added at different levels of the feature hierarchy, enabling slow modality fusion. Using squeeze and excitation operations, MMTM utilizes the knowledge of multiple modalities to recalibrate the channel-wise features in each CNN stream. Despite other intermediate fusion methods, the proposed module could be used for feature modality fusion in convolution layers with different spatial dimensions. Another…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haamoon/mmtm
pytorch

Videos

MMTM: Multimodal Transfer Module for CNN Fusion· youtube

Taxonomy

MethodsConvolution