Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
Debapriya Banerjee, Fotios Lygerakis, Fillia Makedon

TL;DR
This paper introduces a novel multi-head attention LSTM-based fusion technique for multi-modal sentiment analysis, leveraging text, audio, and visual data to improve emotion recognition accuracy.
Contribution
The work presents a new fusion method using multi-head attention LSTM networks specifically designed for multi-modal sentiment analysis.
Findings
Improved sentiment classification accuracy on MOSI dataset
Effective integration of text, audio, and visual modalities
Demonstrated superiority over existing fusion techniques
Abstract
Multi-modal sentiment analysis plays an important role for providing better interactive experiences to users. Each modality in multi-modal data can provide different viewpoints or reveal unique aspects of a user's emotional state. In this work, we use text, audio and visual modalities from MOSI dataset and we propose a novel fusion technique using a multi-head attention LSTM network. Finally, we perform a classification task and evaluate its performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Music and Audio Processing
MethodsSoftmax · Linear Layer · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
