Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition
Yi Zhang, Mingyuan Chen, Jundong Shen, Chongjun Wang

TL;DR
This paper introduces TAILOR, a novel multi-modal learning framework for multi-label emotion recognition that refines modality representations and exploits label-modal relationships to improve accuracy.
Contribution
It proposes an adversarial refinement module, a cross-modal encoder, and a label-guided decoder to better capture modality diversity and label-specific features.
Findings
Outperforms state-of-the-art on CMU-MOSEI dataset
Effective in both aligned and unaligned settings
Demonstrates significant improvement in multi-label emotion recognition
Abstract
Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels, which neglects the diversity of each modality and fails to capture richer semantic information for each label from different perspectives. Besides, associated relationships of modalities and labels have not been fully exploited. In this paper, we propose versaTile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label. Specifically, we design an adversarial multi-modal refinement module to sufficiently explore the commonality among different modalities and strengthen the diversity of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Text and Document Classification Technologies
