Getting the subtext without the text: Scalable multimodal sentiment   classification from visual and acoustic modalities

Nathaniel Blanchard; Daniel Moreira; Aparna Bharati; Walter J.; Scheirer

arXiv:1807.01122·cs.CV·July 4, 2018

Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

Nathaniel Blanchard, Daniel Moreira, Aparna Bharati, Walter J., Scheirer

PDF

TL;DR

This paper introduces a scalable multimodal sentiment classification model that uses only high-level visual and audio features, avoiding transcription to improve deployability and effectiveness in analyzing spoken sentiment.

Contribution

The paper presents a novel multimodal fusion approach that relies solely on high-level video and audio features, demonstrating its effectiveness without traditional transcription features.

Findings

01

Achieved an F1 score of 0.8049 on validation set

02

Achieved an F1 score of 0.6325 on test set

03

Proves high-level features can effectively detect sentiment

Abstract

In the last decade, video blogs (vlogs) have become an extremely popular method through which people express sentiment. The ubiquitousness of these videos has increased the importance of multimodal fusion models, which incorporate video and audio features with traditional text features for automatic sentiment detection. Multimodal fusion offers a unique opportunity to build models that learn from the full depth of expression available to human viewers. In the detection of sentiment in these videos, acoustic and video features provide clarity to otherwise ambiguous transcripts. In this paper, we present a multimodal fusion model that exclusively uses high-level video and audio features to analyze spoken sentences for sentiment. We discard traditional transcription features in order to minimize human intervention and to maximize the deployability of our model on at-scale real-world data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.