Memory based fusion for multi-modal deep learning

Darshana Priyasad; Tharindu Fernando; Simon Denman; Sridha Sridharan,; Clinton Fookes

arXiv:2007.08076·cs.LG·October 26, 2020

Memory based fusion for multi-modal deep learning

Darshana Priyasad, Tharindu Fernando, Simon Denman, Sridha Sridharan,, Clinton Fookes

PDF

Open Access

TL;DR

This paper introduces a Memory based Attentive Fusion layer for multi-modal deep learning that captures long-term dependencies and improves fusion performance over naive methods.

Contribution

The paper proposes a novel fusion layer incorporating explicit memory and attention mechanisms to better model long-term dependencies in multi-modal data.

Findings

01

Enhanced performance on multiple datasets

02

Generalizes across different modalities and networks

03

Outperforms naive fusion methods

Abstract

The use of multi-modal data for deep machine learning has shown promise when compared to uni-modal approaches with fusion of multi-modal features resulting in improved performance in several applications. However, most state-of-the-art methods use naive fusion which processes feature streams independently, ignoring possible long-term dependencies within the data during fusion. In this paper, we present a novel Memory based Attentive Fusion layer, which fuses modes by incorporating both the current features and longterm dependencies in the data, thus allowing the model to understand the relative importance of modes over time. We introduce an explicit memory block within the fusion layer which stores features containing long-term dependencies of the fused data. The feature inputs from uni-modal encoders are fused through attentive composition and transformation followed by naive fusion of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Music and Audio Processing · Video Analysis and Summarization