Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Debapriya Banerjee; Fotios Lygerakis; Fillia Makedon

arXiv:2106.11473·cs.LG·June 23, 2021

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Debapriya Banerjee, Fotios Lygerakis, Fillia Makedon

PDF

Open Access

TL;DR

This paper introduces a novel multi-head attention LSTM-based fusion technique for multi-modal sentiment analysis, leveraging text, audio, and visual data to improve emotion recognition accuracy.

Contribution

The work presents a new fusion method using multi-head attention LSTM networks specifically designed for multi-modal sentiment analysis.

Findings

01

Improved sentiment classification accuracy on MOSI dataset

02

Effective integration of text, audio, and visual modalities

03

Demonstrated superiority over existing fusion techniques

Abstract

Multi-modal sentiment analysis plays an important role for providing better interactive experiences to users. Each modality in multi-modal data can provide different viewpoints or reveal unique aspects of a user's emotional state. In this work, we use text, audio and visual modalities from MOSI dataset and we propose a novel fusion technique using a multi-head attention LSTM network. Finally, we perform a classification task and evaluate its performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Music and Audio Processing

MethodsSoftmax · Linear Layer · Tanh Activation · Sigmoid Activation · Long Short-Term Memory