# STHMA: Decoupling Spatio-Temporal Dynamics in EEG via Hybrid State Space Modeling

**Authors:** Shuo Yang, Lintong Zhang, Youyi Cheng, Yingying Zheng, Shuai Zheng, Jiahui Guo, Lirong Zheng

PMC · DOI: 10.3390/brainsci16030267 · Brain Sciences · 2026-02-27

## TL;DR

The STHMA framework improves emotion recognition from EEG data by combining state space models with attention mechanisms, outperforming existing methods.

## Contribution

Introduces a decoupled spatio-temporal scanning strategy and physics-aware embeddings for better EEG emotion recognition.

## Key findings

- STHMA achieves state-of-the-art emotion recognition performance on FACED and SEED-V datasets.
- Decoupled spatial-temporal scanning is critical for modeling complex EEG dynamics.
- Linear-complexity state space models enable real-time processing of high-resolution neural recordings.

## Abstract

What are the main findings?
The proposed STHMA framework achieves state-of-the-art emotion recognition performance on the FACED and SEED-V datasets, outperforming Transformer-based baselines by effectively combining linear-complexity State Space Models with global attention mechanisms.Ablation studies demonstrate that the “Decoupled Spatial–Temporal Scanning” strategy—which alternates between modeling instantaneous brain connectivity and continuous temporal dynamics—is the most critical component for reconstructing the complex spatio-temporal manifold of EEG data.

The proposed STHMA framework achieves state-of-the-art emotion recognition performance on the FACED and SEED-V datasets, outperforming Transformer-based baselines by effectively combining linear-complexity State Space Models with global attention mechanisms.

Ablation studies demonstrate that the “Decoupled Spatial–Temporal Scanning” strategy—which alternates between modeling instantaneous brain connectivity and continuous temporal dynamics—is the most critical component for reconstructing the complex spatio-temporal manifold of EEG data.

What are the implications of the main findings?
The results validate that modeling physiological signals as continuous dynamical systems via State Space Models offers better representational fidelity than the discrete tokenization used in Transformers, resolving theoretical mismatches in biological signal processing.The architecture’s linear computational complexity overcomes the scalability bottlenecks of traditional attention mechanisms, enabling the development of real-time Brain–Computer Interfaces capable of processing long-duration high-resolution neural recordings.

The results validate that modeling physiological signals as continuous dynamical systems via State Space Models offers better representational fidelity than the discrete tokenization used in Transformers, resolving theoretical mismatches in biological signal processing.

The architecture’s linear computational complexity overcomes the scalability bottlenecks of traditional attention mechanisms, enabling the development of real-time Brain–Computer Interfaces capable of processing long-duration high-resolution neural recordings.

Background/Objectives: Decoding affective states from Electroencephalography (EEG) signals is fundamental to non-invasive Brain–Computer Interfaces. Despite recent advances, accurate recognition is impeded by the inherently non-stationary nature of physiological signals and the entanglement of spatio-temporal dynamics within high-dimensional recordings. While Transformers excel at global modeling, they often neglect the continuous dynamical properties of neural signals and suffer from quadratic complexity. Methods: In this paper, we propose the Spatio-Temporal Hybrid Mamba-Attention (STHMA), a framework designed to explicitly disentangle and model EEG dynamics via linear-complexity State Space Models. First, to incorporate domain knowledge, we introduce a Dual-Domain Physics-Aware Embedding module. This module fuses learnable temporal convolutions with explicit frequency-domain spectral features, ensuring fidelity to neurophysiological principles. Second, we propose a novel Decoupled Spatial–Temporal Scanning strategy. By dynamically reconfiguring the serialization of the data tensor, our model strictly separates the learning of instantaneous functional connectivity from the tracking of emotional state evolution, thereby preventing the structural collapse common in 1D sequence models. Results: Extensive experiments on the FACED and SEED-V datasets demonstrate that the STHMA achieves state-of-the-art performance, significantly exceeding the random chance baselines (11.11% for 9-class FACED and 20.00% for 5-class SEED-V). Conclusions: The results validate that combining Physics-Aware Embeddings with decoupled state-space modeling offers a scalable and effective paradigm for EEG emotion recognition.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), FACED (MESH:C536384)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13024159/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13024159/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC13024159/full.md

---
Source: https://tomesphere.com/paper/PMC13024159