# Calibrated Transformer Fusion for Dual-View Low-Energy CESM Classification

**Authors:** Ahmed A. H. Alkurdi, Amira Bibo Sallow

PMC · DOI: 10.3390/jimaging12010041 · Journal of Imaging · 2026-01-13

## TL;DR

This paper introduces a dual-view classification framework using CNNs and transformers for breast-side classification in low-energy CESM images, achieving high accuracy and calibrated uncertainty estimates.

## Contribution

The novel contribution is a dual-backbone CNN with transformer fusion and MC-dropout for uncertainty estimation in CESM classification.

## Key findings

- Model E achieved 96.88% mean accuracy and 97.68% mean F1-score across five test folds.
- The model demonstrated strong calibration with a mean Brier score of 0.0236 and ECE of 0.0334.
- An ablation study confirmed the benefits of dual-view input and transformer fusion.

## Abstract

Contrast-enhanced spectral mammography (CESM) provides low-energy images acquired in standard craniocaudal (CC) and mediolateral oblique (MLO) views, and clinical interpretation relies on integrating both views. This study proposes a dual-view classification framework that combines deep CNN feature extraction with transformer-based fusion for breast-side classification using low-energy (DM) images from CESM acquisitions (Normal vs. Tumorous; benign and malignant merged). The evaluation was conducted using 5-fold stratified group cross-validation with patient-level grouping to prevent leakage across folds. The final configuration (Model E) integrates dual-backbone feature extraction, transformer fusion, MC-dropout inference for uncertainty estimation, and post hoc logistic calibration. Across the five held-out test folds, Model E achieved a mean accuracy of 96.88% ± 2.39% and a mean F1-score of 97.68% ± 1.66%. The mean ROC-AUC and PR-AUC were 0.9915 ± 0.0098 and 0.9968 ± 0.0029, respectively. Probability quality was supported by a mean Brier score of 0.0236 ± 0.0145 and a mean expected calibration error (ECE) of 0.0334 ± 0.0171. An ablation study (Models A–E) was also reported to quantify the incremental contribution of dual-view input, transformer fusion, and uncertainty calibration. Within the limits of this retrospective single-center setting, these results suggest that dual-view transformer fusion can provide strong discrimination while also producing calibrated probabilities and uncertainty outputs that are relevant for decision support.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** Tumorous (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12842785/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842785/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12842785/full.md

---
Source: https://tomesphere.com/paper/PMC12842785