Learn to Combine Modalities in Multimodal Deep Learning

Kuan Liu; Yanen Li; Ning Xu; Prem Natarajan

arXiv:1805.11730·stat.ML·May 31, 2018·131 cites

Learn to Combine Modalities in Multimodal Deep Learning

Kuan Liu, Yanen Li, Ning Xu, Prem Natarajan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel deep learning method that multiplicatively combines multiple modalities to enhance classification accuracy by focusing on more reliable sources and capturing cross-modal correlations.

Contribution

It proposes a multiplicative fusion technique for multimodal data that automatically emphasizes reliable modalities and models cross-modal interactions, improving performance.

Findings

01

Consistent accuracy improvements across three multimodal classification tasks.

02

Effective filtering of noise and conflicts between modalities.

03

Enhanced modeling of cross-modal signal correlations.

Abstract

Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches. However, it is challenging to fully leverage different modalities due to practical challenges such as varying levels of noise and conflicts between modalities. Existing methods do not adopt a joint approach to capturing synergies between the modalities while simultaneously filtering noise and resolving conflicts on a per sample basis. In this work we propose a novel deep neural network based technique that multiplicatively combines information from different source modalities. Thus the model training process automatically focuses on information from more reliable modalities while reducing emphasis on the less reliable modalities. Furthermore, we propose an extension that multiplicatively combines not only the single-source modalities, but a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skywaLKer518/MultiplicativeMultimodal
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis