EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Deng Li, Xin Liu, Bohao Xing, Baiqiang Xia, Yuan Zong, Bihan Wen,, Heikki K\"alvi\"ainen

TL;DR
This paper introduces EALD, a new dataset for long-sequential emotion analysis using multimodal large language models, emphasizing privacy-preserving cues like non-facial body language and demonstrating the effectiveness of MLLMs in this context.
Contribution
The paper constructs the EALD dataset for long-sequential emotion analysis and evaluates multimodal large language models using de-identification signals, highlighting NFBL's importance.
Findings
MLLMs outperform single-modal models in emotion analysis.
NFBL is a crucial cue for understanding emotions in long videos.
MLLMs perform well even in zero-shot scenarios.
Abstract
Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliberately guided or hidden. In contrast, long sequential videos can reveal authentic emotions; 2) Previous studies commonly utilize various signals such as facial, speech, and even sensitive biological signals (e.g., electrocardiogram). However, due to the increasing demand for privacy, developing Emotion AI without relying on sensitive signals is becoming important. To address the aforementioned limitations, in this paper, we construct a dataset for Emotion Analysis in Long-sequential and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis
