TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition
Xinda Li

TL;DR
This paper introduces TACOformer, a novel multimodal fusion method using token-channel compounded cross attention, which effectively models inter-modal dependencies for improved emotion recognition from physiological signals.
Contribution
The paper proposes a unified cross attention module that captures both token-level and channel-level dependencies, enhancing multimodal emotion recognition performance.
Findings
Achieves state-of-the-art results on DEAP and Dreamer datasets.
Effectively models inter-modal dependencies with TACO cross attention.
Improves classification accuracy in emotion recognition tasks.
Abstract
Recently, emotion recognition based on physiological signals has emerged as a field with intensive research. The utilization of multi-modal, multi-channel physiological signals has significantly improved the performance of emotion recognition systems, due to their complementarity. However, effectively integrating emotion-related semantic information from different modalities and capturing inter-modal dependencies remains a challenging issue. Many existing multimodal fusion methods ignore either token-to-token or channel-to-channel correlations of multichannel signals from different modalities, which limits the classification capability of the models to some extent. In this paper, we propose a comprehensive perspective of multimodal fusion that integrates channel-level and token-level cross-modal interactions. Specifically, we introduce a unified cross attention module called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Emotion and Mood Recognition · ECG Monitoring and Analysis
