# SAWGAN-BDCMA: A Self-Attention Wasserstein GAN and Bidirectional Cross-Modal Attention Framework for Multimodal Emotion Recognition

**Authors:** Ning Zhang, Shiwei Su, Haozhe Zhang, Hantong Yang, Runfang Hao, Kun Yang

PMC · DOI: 10.3390/s26020582 · Sensors (Basel, Switzerland) · 2026-01-15

## TL;DR

This paper introduces a new framework for emotion recognition using brain and heart signals, which improves accuracy by combining multiple data types and advanced machine learning techniques.

## Contribution

The novel SAWGAN-BDCMA framework combines self-attention Wasserstein GAN and bidirectional cross-modal attention for improved multimodal emotion recognition.

## Key findings

- SAWGAN-BDCMA achieved 94.25% accuracy for binary emotion classification on the DEAP dataset.
- The framework attained 97.49% accuracy for six-class emotion recognition on the ECSMP dataset.
- It outperformed existing methods by 0.57% to 14.01% across various tasks.

## Abstract

Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is proposed. This framework reorganizes the learning process around three complementary components: (1) a Self-Attention Wasserstein GAN (SAWGAN) that synthesizes high-quality Electroencephalography (EEG) and Photoplethysmography (PPG) to expand diversity and alleviate distributional imbalance; (2) a dual-branch architecture that distills discriminative spatiotemporal representations within each modality; and (3) a Bidirectional Cross-Modal Attention (BDCMA) mechanism that enables deep two-way interaction and adaptive weighting for robust fusion. Evaluated on the DEAP and ECSMP datasets, SAWGAN-BDCMA significantly outperforms multiple contemporary methods, achieving 94.25% accuracy for binary and 87.93% for quaternary classification on DEAP. Furthermore, it attains 97.49% accuracy for six-class emotion recognition on the ECSMP dataset. Compared with state-of-the-art multimodal approaches, the proposed framework achieves an accuracy improvement ranging from 0.57% to 14.01% across various tasks. These findings offer a robust solution to the long-standing challenges of data scarcity and modal imbalance, providing a profound theoretical and technical foundation for fine-grained emotion recognition and intelligent human–computer collaboration.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12846076/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12846076/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12846076/full.md

---
Source: https://tomesphere.com/paper/PMC12846076