# Feasibility of Multimodal Deep Learning for Automated Staging of Familial Exudative Vitreoretinopathy Using Color Fundus Photographs and Fluorescein Angiography

**Authors:** Mingzhen Yuan, Tianyu Wang, Zirong Liu, Jinghua Liu, Jing Ma, Guangda Deng, Liang Li, Songfeng Li, Yan Hu, Hai Lu

PMC · DOI: 10.3390/diagnostics15212752 · 2025-10-30

## TL;DR

This paper explores using deep learning models to automatically stage a rare eye disease called FEVR using eye images and angiography.

## Contribution

The study introduces a novel multimodal dataset and compares deep learning models for FEVR staging, showing that multimodal fusion outperforms single-modal approaches.

## Key findings

- Transformers outperformed CNNs in single-modal analysis of FEVR staging.
- CRD-Net achieved peak performance with AUC up to 0.94 in severe FEVR cases.
- Multimodal deep learning models showed high specificity and accuracy comparable to specialists.

## Abstract

Introduction: To evaluate the feasibility of multimodal deep learning (DL) for automated staging of familial exudative vitreoretinopathy (FEVR) using color fundus photographs (CFP) and fluorescein angiography (FFA). Methods: We assembled a multimodal dataset across FEVR stages 0–5 and post-laser cases and benchmarked CNNs (Convolutional Neural Networks), Transformers, and multimodal fusion under center-region and multi-image settings. Class imbalance was mitigated via weighted sampling and focal/class-balanced losses. We report accuracy, recall, precision, macro-F1, Cohen’s κ, and class-wise ROC/AUC with 95% Cis. Results: AI system showed balanced performance versus specialists (0.65 vs. Dr. A: 0.48/Dr. B: 0.48) in CFP assessment, maintaining high specificity (0.91–0.92). Among architectures: (1) Transformers outperformed CNNs in single-modal analysis; (2) ResNet showed moderate performance (AUC 0.70–0.85) but limited capability for intermediate grades (AUC < 0.70); (3) CRD-Net achieved peak performance (AUC up to 0.94, severe cases AUC > 0.90). While FFA improved Dr. B’s accuracy to 0.56, it remained below AI levels. Stage-specific accuracy ranged from 0.72 to 0.88 across the FEVR spectrum. Conclusions: Leveraging a novel multimodal database and high-performance AI models, systematic comparisons demonstrated the superiority of Transformer architectures over CNNs in single-modal analysis, while CRD-Net’s multimodal fusion approach achieved optimal performance across all severity grades. Multimodal DL shows feasibility as a decision-support tool for automated FEVR staging within confirmed cohorts.

## Linked entities

- **Diseases:** familial exudative vitreoretinopathy (MONDO:0019516), FEVR (MONDO:0019516)

## Full-text entities

- **Diseases:** FEVR (MESH:D000080345)
- **Chemicals:** Fluorescein (MESH:D019793)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12610102/full.md

---
Source: https://tomesphere.com/paper/PMC12610102