Benchmarking Multimodal Sentiment Analysis

Erik Cambria; Devamanyu Hazarika; Soujanya Poria; Amir Hussain; R.B.V.; Subramaanyam

arXiv:1707.09538·cs.MM·August 1, 2017

Benchmarking Multimodal Sentiment Analysis

Erik Cambria, Devamanyu Hazarika, Soujanya Poria, Amir Hussain, R.B.V., Subramaanyam

PDF

TL;DR

This paper introduces a new benchmark framework for multimodal sentiment analysis using CNN-based features from text, visual, and audio data, achieving a 10% performance boost over existing methods.

Contribution

It presents a comprehensive benchmark for multimodal sentiment analysis, emphasizing the importance of modality roles, speaker independence, and generalizability.

Findings

01

10% performance improvement over state-of-the-art

02

Highlights key issues like modality importance and speaker independence

03

Provides a new benchmark for future research

Abstract

We propose a framework for multimodal sentiment analysis and emotion recognition using convolutional neural network-based feature extraction from text and visual modalities. We obtain a performance improvement of 10% over the state of the art by combining visual, text and audio features. We also discuss some major issues frequently ignored in multimodal sentiment analysis research: the role of speaker-independent models, importance of the modalities and generalizability. The paper thus serve as a new benchmark for further research in multimodal sentiment analysis and also demonstrates the different facets of analysis to be considered while performing such tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.