# STCCA: Spatial–Temporal Coupled Cross-Attention Through Hierarchical Network for EEG-Based Speech Recognition

**Authors:** Liang Dong, Hengyi Shao, Lin Zhang, Lei Li

PMC · DOI: 10.3390/s25216541 · 2025-10-23

## TL;DR

This paper introduces a new hierarchical network for EEG-based speech recognition that improves accuracy by better capturing spatial and temporal feature relationships.

## Contribution

A novel spatial–temporal coupled cross-attention mechanism (STCCA) is proposed to enhance EEG-based speech recognition.

## Key findings

- STCCA achieved 45.45% accuracy on one EEG speech dataset, outperforming existing models.
- The model showed improvements of up to 3.98% on another dataset compared to baseline methods.
- The hierarchical design with CCA fusion module effectively captures cross-feature interactions.

## Abstract

Speech recognition based on Electroencephalogram (EEG) has attracted considerable attention due to its potential in communication and rehabilitation. Existing methods typically process spatial and temporal features with sequential, parallel, or constrained feature fusion strategies. However, the intricate cross-relationships between spatial and temporal features remain underexplored. To address these limitations, we propose a spatial–temporal coupled cross-attention mechanism through a hierarchical network, named STCCA. The proposed STCCA consists of three key components: local feature extraction module (LFEM), coupled cross-attention (CCA) fusion module, and global feature extraction module (GFEM). The LFEM employs CNNs to extract local temporal and spatial features, while the CCA fusion module leverages a dual-directional attention mechanism to establish deep interactions between temporal and spatial features. The GFEM uses multi-head self-attention layers to model long-range dependencies and extract global features comprehensively. STCCA is validated on three EEG-based speech datasets, achieving accuracies of 45.45%, 25.91%, and 29.07%, corresponding to improvements of 1.95%, 3.98%, and 1.98% over the comparison models.

## Full-text entities

- **Diseases:** CCA (MESH:C537866), neurological damage (MESH:D020196), Language impairment (MESH:D007806), injury to (MESH:D014947)
- **Chemicals:** water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12610093/full.md

---
Source: https://tomesphere.com/paper/PMC12610093