TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
Feng Liu, Ziwang Fu, Yunlong Wang, Qijian Zheng

TL;DR
This paper introduces TACFN, a novel Transformer-based fusion network that adaptively selects and reinforces features across modalities, significantly improving multimodal emotion recognition performance.
Contribution
The paper proposes an innovative adaptive cross-modal fusion method using intra-modal feature selection and feature reinforcement, advancing multimodal emotion recognition techniques.
Findings
Achieves state-of-the-art results on RAVDESS and IEMOCAP datasets.
Significant performance improvement over existing fusion methods.
Effective feature selection enhances cross-modal interaction.
Abstract
The fusion technique is the key to the multimodal emotion recognition task. Recently, cross-modal attention-based fusion methods have demonstrated high performance and strong robustness. However, cross-modal attention suffers from redundant features and does not capture complementary features well. We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction, and the features that can reinforce a modality may contain only a part of it. To this end, we design an innovative Transformer-based Adaptive Cross-modal Fusion Network (TACFN). Specifically, for the redundant features, we make one modality perform intra-modal feature selection through a self-attention mechanism, so that the selected features can adaptively and efficiently interact with another modality. To better capture the complementary information between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Face and Expression Recognition
MethodsSoftmax · Attention Is All You Need · Feature Selection
