A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading

Junlai Qiu; Yunzhu Chen; Hao Zheng; Yawen Huang; Yuexiang Li

arXiv:2510.26315·cs.CV·October 31, 2025

A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading

Junlai Qiu, Yunzhu Chen, Hao Zheng, Yawen Huang, Yuexiang Li

PDF

TL;DR

This paper introduces a hybrid framework combining CNN and ViT for diabetic retinopathy grading, using evidence theory for feature fusion, leading to improved accuracy and interpretability in automated diagnosis.

Contribution

It proposes a novel evidential fusion paradigm that effectively combines CNN and ViT features, enhancing DR grading performance and interpretability.

Findings

01

Outperforms state-of-the-art DR grading methods.

02

Provides better interpretability in feature fusion.

03

Achieves higher accuracy on public datasets.

Abstract

Diabetic retinopathy (DR) is a leading cause of vision loss among middle-aged and elderly people, which significantly impacts their daily lives and mental health. To improve the efficiency of clinical screening and enable the early detection of DR, a variety of automated DR diagnosis systems have been recently established based on convolutional neural network (CNN) or vision Transformer (ViT). However, due to the own shortages of CNN / ViT, the performance of existing methods using single-type backbone has reached a bottleneck. One potential way for the further improvements is integrating different kinds of backbones, which can fully leverage the respective strengths of them (\emph{i.e.,} the local feature extraction capability of CNN and the global feature capturing ability of ViT). To this end, we propose a novel paradigm to effectively fuse the features extracted by different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.