# FetCAT: Cross-attention fusion of transformer-CNN architecture for fetal brain plane classification with explainability using motion-degraded MRI

**Authors:** Sayma Alam Suha, Rifat Shahriyar, Alessandro Bruno, Alessandro Bruno, Alessandro Bruno

PMC · DOI: 10.1371/journal.pone.0340286 · PLOS One · 2026-01-20

## TL;DR

FetCAT is a new AI model that accurately classifies fetal brain MRI planes using a hybrid transformer-CNN architecture, achieving high accuracy and clinical interpretability.

## Contribution

Introduces FetCAT, a cross-attention hybrid model combining Swin Transformer and AdaptiveMed-CNN for fetal MRI plane classification with explainability.

## Key findings

- FetCAT achieved 98.64% accuracy on motion-degraded fetal MRI slices without data augmentation.
- Grad-CAM visualization showed the model focuses on clinically relevant anatomical landmarks.
- The model generalized well to an unseen dataset with 81.0% accuracy.

## Abstract

Fetal brain magnetic resonance imaging (MRI) has been recognized as a vital diagnostic tool for identifying neurological anomalies during pregnancy. Accurate classification of fetal MRI planes is essential for effective prenatal neurological assessment, yet this task remains challenging in clinical practice. Key obstacles include the reliance on manual identification by specialized neuroradiologists, resource-constraints, motion-induced artifacts from fetal movement, and insufficient clinical interpretability of automated methods. This study presents FetCAT (Fetal Cross-Attention Transformer), a novel hybrid architecture that integrates a pre-trained Swin Transformer with a custom AdaptiveMed-CNN model through cross-attention fusion mechanisms for automated fetal brain MRI plane classification. The proposed hybrid architecture combines the global contextual understanding capabilities of transformers with the local feature extraction strengths of CNN through a sophisticated cross-attention mechanism. The model was trained and tested with a large-scale dataset of 52,561 motion-degraded fetal MRI slices from 741 patients, encompassing three anatomical planes and a gestational age of 19-39 weeks. Comprehensive comparative analyses were conducted across pre-trained CNN architectures, baseline and pre-trained transformer models, and the proposed hybrid configurations to evaluate the efficacy. Systematic ablation studies were performed to evaluate the impact of domain-specific data augmentation strategies on model performance. Robust statistical evaluation, including mean, variance, confidence intervals, and McNemar’s test, substantiated the significant performance advantage of the proposed architecture over all competing models. Additionally, Grad-CAM-based explainability analysis was implemented to provide visual interpretations of the model’s decision-making process, thereby enhancing clinical interpretability. The proposed cross-attention based Swin-AdaptiveMedCNN model achieved superior performance with 98.64% accuracy without data augmentation, substantially outperforming standalone CNN models, baseline and pre-trained transformers. Explainability analysis using Grad-CAM visualization demonstrated that the model focuses on clinically relevant anatomical landmarks. Contrary to common assumptions, ablation studies revealed that data augmentation consistently reduced model performance rather than improving it. This result can be attributed to the inherent diversity and natural variability already present in the dataset, which rendered additional synthetic variations counterproductive. Moreover, the proposed FetCAT model also demonstrated strong generalization capability, maintaining superior and statistically significant performance on an unseen OpenNeuro MRI test dataset with 81.0% accuracy. Thus, this study establishes a benchmark for automated fetal brain MRI plane classification.

## Full-text entities

- **Diseases:** neurological anomalies (MESH:D009421)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12818608/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12818608/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12818608/full.md

---
Source: https://tomesphere.com/paper/PMC12818608