# Decoupled Bidirectional Spatio-Temporal Fusion Network for Hybrid EEG-fNIRS Cognitive Task Classification

**Authors:** Zirui Wang, Guanghao Huang, Zhuochao Chen, Xiaorui Liu, Yinhua Liu, Keum-Shik Hong

PMC · DOI: 10.3390/brainsci16020241 · Brain Sciences · 2026-02-21

## TL;DR

A new deep learning framework called BiSTF-Net improves the classification of cognitive tasks by effectively fusing EEG and fNIRS brain signals.

## Contribution

BiSTF-Net introduces a decoupled fusion pipeline with bidirectional spatial guidance and adaptive temporal alignment for EEG-fNIRS signal integration.

## Key findings

- BiSTF-Net achieves 83.33% accuracy for mental arithmetic tasks, 82.09% for motor imagery, and 84.99% for word generation.
- The framework's adaptive temporal alignment module addresses subject-specific delays in fNIRS signals, improving data reliability.
- The decoupled design of BiSTF-Net offers a generalizable solution for heterogeneous signal fusion in neuroengineering.

## Abstract

What are the main findings?
A novel decoupled deep learning framework, BiSTF-Net, is proposed, which systematically addresses the spatio-temporal heterogeneity between EEG and fNIRS signals through bidirectional spatial guidance and adaptive temporal alignment.The framework introduces a decoupled, multi-stage fusion pipeline, featuring a bi-directional cross-modal guidance (Bi-CMG) module for early spatial feature enhancement and a symmetric cross-attention fusion (SCAF) module for late-stage deep fusion.

A novel decoupled deep learning framework, BiSTF-Net, is proposed, which systematically addresses the spatio-temporal heterogeneity between EEG and fNIRS signals through bidirectional spatial guidance and adaptive temporal alignment.

The framework introduces a decoupled, multi-stage fusion pipeline, featuring a bi-directional cross-modal guidance (Bi-CMG) module for early spatial feature enhancement and a symmetric cross-attention fusion (SCAF) module for late-stage deep fusion.

What are the implications of the main findings?
The proposed adaptive temporal alignment (ATA) module provides a data-driven solution to overcome the critical bottleneck of inherent, subject-specific delays in fNIRS signals, enhancing the reliability and accuracy of multimodal data.This work offers a powerful and interpretable paradigm for multimodal neural classification, whose decoupled design (Bi-CMG, ATA, SCAF) provides an effective and generalizable solution for tackling heterogeneous signal fusion challenges in neuroengineering.

The proposed adaptive temporal alignment (ATA) module provides a data-driven solution to overcome the critical bottleneck of inherent, subject-specific delays in fNIRS signals, enhancing the reliability and accuracy of multimodal data.

This work offers a powerful and interpretable paradigm for multimodal neural classification, whose decoupled design (Bi-CMG, ATA, SCAF) provides an effective and generalizable solution for tackling heterogeneous signal fusion challenges in neuroengineering.

Background/Objectives: Multimodal neuroimaging, particularly the integration of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a key methodology for investigating brain function and classifying neural activity. However, the efficient fusion of these two signals remains a formidable challenge due to their significant spatio-temporal heterogeneity. This paper presents the BiSTF-Net, which integrates decoupled and bi-directional spatio-temporal fusion mechanisms to enhance the performance of cognitive task recognition. Methods: In BiSTF-Net, the spatial features of EEG and fNIRS are mutually guided and enhanced through an efficient bi-directional cross modal guidance (Bi-CMG). Then, the temporal latencies of fNIRS signals are aligned in a data-driven manner using adaptive temporal alignment (ATA). Subsequently, the aligned features are deeply fused into a modality-invariant, discriminative representation via a symmetric cross-attention fusion (SCAF) module. Results: Evaluated on the mental arithmetic (MA), motor imagery (MI), and word generation (WG) tasks, the BiSTF-Net achieves average accuracies of 83.33%, 82.09%, and 84.99% respectively. Conclusions: The BiSTF-Net exhibits superior performance compared to the existing methods, offers a robust and interpretable solution for multimodal EEG-fNIRS cognitive task classification, and provides a methodological foundation for future extensions to other multimodal data and broader real-world clinical applications.

## Full-text entities

- **Diseases:** MI (MESH:D000068079), PCC (MESH:C536353), injury to (MESH:D014947), MA (MESH:D008607)
- **Chemicals:** ATA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938437/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938437/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938437/full.md

---
Source: https://tomesphere.com/paper/PMC12938437