# Graph-enhanced multimodal fusion of vascular biomarkers and deep features for diabetic retinopathy detection

**Authors:** K. V. Deepsahith, Basineni Shashank, Bangipavan Kumar, Sherly Alphonse, Brindha Subburaj, Girish Subramanian

PMC · DOI: 10.3389/frai.2025.1731633 · 2026-02-02

## TL;DR

This paper introduces a new method for detecting diabetic retinopathy by combining deep learning features and vascular biomarkers using a transformer-based fusion model.

## Contribution

The novel contribution is a graph-enhanced multimodal fusion framework using transformer cross-attention for diabetic retinopathy detection.

## Key findings

- The model achieves 93.8% accuracy and 0.96 AUC-ROC on the Messidor-2 dataset.
- It outperforms existing methods with above 98% accuracy on Eyepacs and APTOS 2019 datasets.

## Abstract

Diabetic retinopathy (DR) detection can be performed through both deep retinal representations and vascular biomarkers. This proposed work suggests a multimodal framework that combines deep features with vascular descriptors in transformer fusion architecture. Fundus images are preprocessed using CLAHE, Canny edge detection, Top-hat transformation, and U-Net vessel segmentation. Then, the images are passed through a convolutional block attention module (CBAM)-fused enhanced MobileNetV3 backbone for deep spatial feature extraction. In parallel, the segmented vasculature is skeletonized to create a vascular graph, and the descriptors are computed using fractal dimension analysis (FDA), artery-to-vein ratio (AVR), and gray level co-occurrence matrix (GLCM) texture. A graph neural network (GNN) then generates a global topology-aware embedding using all that information. The different modalities are integrated using a transformer-based cross-modal fusion, where the feature vectors from MobileNet and GNN-based vascular embeddings interact using multi-head cross-attention. The fused representation is then given to a Softmax classifier for DR prediction. The model demonstrates superior performance compared to traditional deep learning baselines, achieving an accuracy of 93.8%, a precision of 92.1%, a recall of 92.8%, and an AUC-ROC of 0.96 for the DR prediction in the Messidor-2 dataset. The proposed approach also achieves above 98% accuracy for Eyepacs and APTOS 2019 datasets for DR detection. The findings demonstrate that the proposed system provides a reliable framework compared with the existing state-of-the-art methods.

## Linked entities

- **Diseases:** Diabetic retinopathy (MONDO:0005266)

## Full-text entities

- **Diseases:** DR (MESH:D003930)

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12907448/full.md

---
Source: https://tomesphere.com/paper/PMC12907448