MicroEmo: Time-Sensitive Multimodal Emotion Recognition with   Micro-Expression Dynamics in Video Dialogues

Liyun Zhang

arXiv:2407.16552·cs.CV·July 25, 2024

MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues

Liyun Zhang

PDF

TL;DR

MicroEmo is a novel multimodal emotion recognition model that emphasizes local micro-expression dynamics and contextual video segment dependencies, improving open-vocabulary emotion prediction.

Contribution

It introduces a global-local attention visual encoder and an utterance-aware video Q-Former for enhanced temporal and contextual feature extraction.

Findings

01

Effective in explainable multimodal emotion recognition

02

Outperforms recent methods on open-vocabulary tasks

03

Highlights importance of micro-expression dynamics

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable multimodal emotion recognition capabilities, integrating multimodal cues from visual, acoustic, and linguistic contexts in the video to recognize human emotional states. However, existing methods ignore capturing local facial features of temporal dynamics of micro-expressions and do not leverage the contextual dependencies of the utterance-aware temporal segments in the video, thereby limiting their expected effectiveness to a certain extent. In this work, we propose MicroEmo, a time-sensitive MLLM aimed at directing attention to the local facial micro-expression dynamics and the contextual dependencies of utterance-aware video clips. Our model incorporates two key architectural contributions: (1) a global-local attention visual encoder that integrates global frame-level timestamp-bound image features with local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Global-Local Attention