Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos
Mois\'es H. R. Pereira, Fl\'avio L. C. P\'adua, Adriano C. M. Pereira,, Fabr\'icio Benevenuto, Daniel H. Dalip

TL;DR
This paper introduces a multi-modal sentiment analysis method for news videos by fusing audio, visual, and textual features, achieving high accuracy and aiding media analysis.
Contribution
It presents a novel fusion approach combining emotion recognition, speech modulation, and caption sentiment analysis for news video sentiment classification.
Findings
Achieved up to 84% accuracy in sentiment classification.
Effective fusion of audio, visual, and textual features.
Demonstrated potential for journalistic and media analysis applications.
Abstract
This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial expressions, (2) extraction of modulations in the participants' speeches and (3) sentiment analysis from the closed caption associated to the videos of interest. More specifically, we compute features, such as, visual intensities of recognized emotions, field sizes of participants, voicing probability, sound loudness, speech fundamental frequencies and the sentiment scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
