Low Rank Fusion based Transformers for Multimodal Sequences

Saurav Sahay; Eda Okur; Shachi H Kumar; Lama Nachman

arXiv:2007.02038·cs.CL·July 7, 2020

Low Rank Fusion based Transformers for Multimodal Sequences

Saurav Sahay, Eda Okur, Shachi H Kumar, Lama Nachman

PDF

Open Access

TL;DR

This paper introduces a low-rank multimodal fusion transformer architecture that efficiently models interactions between sensory signals for emotion recognition, achieving comparable performance with fewer parameters and faster training.

Contribution

It proposes a novel low-rank fusion approach within transformer models for multimodal emotion recognition, reducing model complexity and training time.

Findings

01

Fewer parameters than existing models

02

Faster training times

03

Comparable accuracy on emotion recognition datasets

Abstract

Our senses individually work in a coordinated fashion to express our emotional intentions. In this work, we experiment with modeling modality-specific sensory signals to attend to our latent multimodal emotional intentions and vice versa expressed via low-rank multimodal fusion and multimodal transformers. The low-rank factorization of multimodal fusion amongst the modalities helps represent approximate multiplicative latent signal interactions. Motivated by the work of~\cite{tsai2019MULT} and~\cite{Liu_2018}, we present our transformer-based cross-fusion architecture without any over-parameterization of the model. The low-rank fusion helps represent the latent signal interactions while the modality-specific attention helps focus on relevant parts of the signal. We present two methods for the Multimodal Sentiment and Emotion Recognition results on CMU-MOSEI, CMU-MOSI, and IEMOCAP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Multimodal Machine Learning Applications