Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Jugal Gajjar; Kaustik Ranaware

arXiv:2505.06110·cs.CL·July 16, 2025

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Jugal Gajjar, Kaustik Ranaware

PDF

Open Access

TL;DR

This paper demonstrates that transformer-based models with early fusion effectively analyze sentiment across text, audio, and visual data, achieving high accuracy and robustness on the CMU-MOSEI dataset.

Contribution

It introduces a multimodal sentiment analysis approach using transformer encoders with early fusion, showing superior performance over previous methods.

Findings

01

97.87% 7-class accuracy on CMU-MOSEI

02

0.9682 F1-score indicating high classification precision

03

Low MAE of 0.1060 for sentiment intensity prediction

Abstract

This project performs multimodal sentiment analysis using the CMU-MOSEI dataset, using transformer-based models with early fusion to integrate text, audio, and visual modalities. We employ BERT-based encoders for each modality, extracting embeddings that are concatenated before classification. The model achieves strong performance, with 97.87% 7-class accuracy and a 0.9682 F1-score on the test set, demonstrating the effectiveness of early fusion in capturing cross-modal interactions. The training utilized Adam optimization (lr=1e-4), dropout (0.3), and early stopping to ensure generalization and robustness. Results highlight the superiority of transformer architectures in modeling multimodal sentiment, with a low MAE (0.1060) indicating precise sentiment intensity prediction. Future work may compare fusion strategies or enhance interpretability. This approach utilizes multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining

MethodsDropout · Masked autoencoder · Early Stopping · Adam