Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation
Lingsi Zhu, Yuefeng Zou, Yunxiang Zhang, Naixiang Zheng, Guoyuan Wang, Jun Yu, Jiaen Liang, Wei Huang, Shengping Liu, Ximin Zheng

TL;DR
This paper introduces TAEMI, a multimodal framework that uses textual transcripts as stable anchors to improve emotional mimicry intensity estimation in noisy, real-world environments, achieving state-of-the-art results.
Contribution
The paper proposes a novel Text-Anchored Dual Cross-Attention mechanism and strategies for handling missing data, enhancing robustness and accuracy in multimodal emotion estimation.
Findings
TAEMI outperforms baseline methods on the Hume-Vidmimic2 dataset.
The framework maintains high performance under noisy and incomplete data conditions.
It effectively captures fine-grained emotional variations.
Abstract
Estimating Emotional Mimicry Intensity (EMI) in naturalistic environments is a critical yet challenging task in affective computing. The primary difficulty lies in effectively modeling the complex, nonlinear temporal dynamics across highly heterogeneous modalities, especially when physical signals are corrupted or missing. To tackle this, we propose TAEMI (Text-Anchored Emotional Mimicry Intensity estimation), a novel multimodal framework designed for the 10th ABAW Competition. Motivated by the observation that continuous visual and acoustic signals are highly susceptible to transient environmental noise, we break the traditional symmetric fusion paradigm. Instead, we leverage textual transcript--which inherently encode a stable, time-independent semantic prior--as central anchors. Specifically, we introduce a Text-Anchored Dual Cross-Attention mechanism that utilizes these robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Music and Audio Processing · EEG and Brain-Computer Interfaces
