TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition

Feng Liu; Ziwang Fu; Yunlong Wang; Qijian Zheng

arXiv:2505.06536·cs.CV·May 13, 2025

TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition

Feng Liu, Ziwang Fu, Yunlong Wang, Qijian Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces TACFN, a novel Transformer-based fusion network that adaptively selects and reinforces features across modalities, significantly improving multimodal emotion recognition performance.

Contribution

The paper proposes an innovative adaptive cross-modal fusion method using intra-modal feature selection and feature reinforcement, advancing multimodal emotion recognition techniques.

Findings

01

Achieves state-of-the-art results on RAVDESS and IEMOCAP datasets.

02

Significant performance improvement over existing fusion methods.

03

Effective feature selection enhances cross-modal interaction.

Abstract

The fusion technique is the key to the multimodal emotion recognition task. Recently, cross-modal attention-based fusion methods have demonstrated high performance and strong robustness. However, cross-modal attention suffers from redundant features and does not capture complementary features well. We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction, and the features that can reinforce a modality may contain only a part of it. To this end, we design an innovative Transformer-based Adaptive Cross-modal Fusion Network (TACFN). Specifically, for the redundant features, we make one modality perform intra-modal feature selection through a self-attention mechanism, so that the selected features can adaptively and efficiently interact with another modality. To better capture the complementary information between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuzihuaiyu/tacfn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Face and Expression Recognition

MethodsSoftmax · Attention Is All You Need · Feature Selection