Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture
Nischal Mandal, Yang Li

TL;DR
This paper introduces a simplified, lightweight multimodal sentiment analysis model that effectively fuses language, audio, and visual cues, achieving high accuracy with less computational complexity.
Contribution
The paper presents a novel, streamlined fusion architecture for multimodal sentiment analysis that outperforms complex models in resource-constrained settings.
Findings
Achieved 92% accuracy on IEMOCAP dataset.
Simple concatenation fusion matches complex models' performance.
Model is suitable for resource-limited environments.
Abstract
Multimodal sentiment analysis, a pivotal task in affective computing, seeks to understand human emotions by integrating cues from language, audio, and visual signals. While many recent approaches leverage complex attention mechanisms and hierarchical architectures, we propose a lightweight, yet effective fusion-based deep learning model tailored for utterance-level emotion classification. Using the benchmark IEMOCAP dataset, which includes aligned text, audio-derived numeric features, and visual descriptors, we design a modality-specific encoder using fully connected layers followed by dropout regularization. The modality-specific representations are then fused using simple concatenation and passed through a dense fusion layer to capture cross-modal interactions. This streamlined architecture avoids computational overhead while preserving performance, achieving a classification accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need · Dropout
