Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild
Nicolas Richet, Soufiane Belharbi, Haseeb Aslam, Meike Emilie Schadt,, Manuela Gonz\'alez-Gonz\'alez, Gustave Cortal, Alessandro Lameiras Koerich,, Marco Pedersoli, Alain Finkel, Simon Bacon, Eric Granger

TL;DR
This paper compares text-based and feature-based models for compound multimodal emotion recognition in videos, highlighting the potential and limitations of textualization of non-verbal cues using large language models.
Contribution
It introduces a textualization approach for multimodal ER, leveraging LLMs to encode non-verbal cues in text, and evaluates its effectiveness against traditional feature-based models.
Findings
Textualization performs worse than feature-based models on wild datasets with sparse transcripts.
Rich transcripts improve the accuracy of text-based models.
Feature-based models generally outperform textualization in challenging wild scenarios.
Abstract
Systems for multimodal emotion recognition (ER) are commonly trained to extract features from different modalities (e.g., visual, audio, and textual) that are combined to predict individual basic emotions. However, compound emotions often occur in real-world scenarios, and the uncertainty of recognizing such complex emotions over diverse modalities is challenging for feature-based models. As an alternative, emerging large language models (LLMs) like BERT and LLaMA can rely on explicit non-verbal cues that may be translated from different non-textual modalities (e.g., audio and visual) into text. Textualization of modalities augments data with emotional cues to help the LLM encode the interconnections between all modalities in a shared text space. In such text-based models, prior knowledge of ER tasks is leveraged to textualize relevant non-verbal cues such as audio tone from vocal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Layer Normalization · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout · LLaMA
