Multimodal Multi-loss Fusion Network for Sentiment Analysis
Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg

TL;DR
This paper presents a multimodal neural network that optimally fuses features from multiple modalities and employs multi-loss training, significantly improving sentiment analysis accuracy across several datasets.
Contribution
It introduces a novel multi-loss fusion network that effectively combines multiple modalities and demonstrates the importance of context integration for sentiment detection.
Findings
Achieved state-of-the-art results on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets.
Identified the impact of subnet performance and context on model accuracy.
Compared various fusion methods to determine optimal feature integration.
Abstract
This paper investigates the optimal selection and fusion of feature encoders across multiple modalities and combines these in one neural network to improve sentiment detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying surprisingly important findings relating to subnet performance. We have also found that integrating context significantly enhances model performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS). These results suggest a roadmap toward an optimized feature selection and fusion approach for enhancing sentiment detection in neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
MethodsFeature Selection
