Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
Wen-Jue He, Xiaofeng Zhu, Zheng Zhang

TL;DR
This paper introduces Cross-modal Prompting (ComP), a novel approach for incomplete multi-modal emotion recognition that enhances modality-specific features and balances information across modalities, especially under missing data conditions.
Contribution
The paper proposes a new ComP method with a progressive prompt generation, cross-modal knowledge propagation, and dynamic re-weighting to improve multi-modal emotion recognition with incomplete data.
Findings
Outperforms state-of-the-art methods on 4 datasets.
Effective under various missing data rates.
Enhances modality-specific feature discrimination.
Abstract
Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially observed multi-source data. Although the multi-modal data is expected to provide more abundant information, the performance gap and modality under-optimization problem hinder effective multi-modal learning in practice, and are exacerbated in the confrontation of the missing data. To address this issue, we devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features and improves the overall recognition accuracy by boosting each modality's performance. Specifically, a progressive prompt generation module with a dynamic gradient modulator is proposed to produce concise and consistent modality semantic cues. Meanwhile, cross-modal knowledge propagation selectively amplifies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Face and Expression Recognition
