# Early-fusion hybrid CNN-transformer models for multiclass ovarian tumor ultrasound classification

**Authors:** Igor Garcia-Atutxa, José Martínez-Más, Andrés Bueno-Crespo, Francisca Villanueva-Flores

PMC · DOI: 10.3389/frai.2025.1679310 · Frontiers in Artificial Intelligence · 2025-10-15

## TL;DR

A new AI model combining CNN and transformer techniques improves the accuracy and reliability of classifying ovarian tumors from ultrasound images.

## Contribution

A novel early-fusion hybrid CNN-transformer model for multiclass ovarian tumor classification with calibrated and explainable AI features.

## Key findings

- The hybrid model achieved AUC 0.9904, accuracy 92.13%, sensitivity 92.38%, and specificity 98.90% on ovarian tumor classification.
- A soft ensemble of top models improved performance to AUC 0.991, accuracy 93.3%, sensitivity 93.6%, and specificity 99.0%.
- The model provides calibrated probabilities, clinical benefit, and uncertainty-aware decision support with real-world data validation.

## Abstract

Ovarian cancer remains the deadliest gynecologic malignancy, and transvaginal ultrasound (TVS), the first-line test, still suffers from limited specificity and operator dependence. We introduce a learned early-fusion (joint projection) hybrid that couples EfficientNet-B7 (local descriptors) with a Swin Transformer (hierarchical global context) to classify eight ovarian tumor categories from 2D TVS. Using the public, de-identified OTU-2D dataset (n = 1,469 images across eight histopathologic classes), we conducted patient-level, stratified 5-fold cross-validation repeated 10×. To address class imbalance while preventing leakage, training used train-only oversampling, ultrasound-aware augmentations, and strong regularization; validation/test folds were never resampled. The hybrid achieved AUC 0.9904, accuracy 92.13%, sensitivity 92.38%, and specificity 98.90%, outperforming single CNN or ViT baselines. A soft ensemble of the top hybrids further improved performance to AUC 0.991, accuracy 93.3%, sensitivity 93.6%, and specificity 99.0%. Beyond discrimination, we provide deployment-oriented evaluation: isotonic calibration yielded reliable probabilities, decision-curve analysis showed net clinical benefit across 5–20% risk thresholds, entropy-based uncertainty supported confidence-based triage, and Grad-CAM highlighted clinically salient regions. All metrics are reported with 95% bootstrap confidence intervals, and the evaluation protocol preserves real-world data distributions. Taken together, this work advances ovarian ultrasound AI from accuracy-only reporting to calibrated, explainable, and uncertainty-aware decision support, offering a reproducible reference framework for multiclass ovarian ultrasound and a clear path toward clinical integration and prospective validation.

## Linked entities

- **Diseases:** ovarian cancer (MONDO:0005140)

## Full-text entities

- **Diseases:** Ovarian cancer (MESH:D010051), gynecologic malignancy (MESH:D005833)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12568491/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12568491/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12568491/full.md

---
Source: https://tomesphere.com/paper/PMC12568491