MaxGlaViT: A novel lightweight vision transformer-based approach for   early diagnosis of glaucoma stages from fundus images

Mustafa Yurdakul; Kubra Uyar; Sakir Tasdemir

arXiv:2502.17154·cs.CV·February 25, 2025

MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images

Mustafa Yurdakul, Kubra Uyar, Sakir Tasdemir

PDF

Open Access 1 Repo

TL;DR

MaxGlaViT is a lightweight, transformer-based model that significantly improves early glaucoma detection accuracy from fundus images through architectural enhancements and extensive validation.

Contribution

The paper introduces MaxGlaViT, a novel lightweight vision transformer architecture optimized for early glaucoma diagnosis, outperforming existing models on the HDV1 dataset.

Findings

01

MaxGlaViT achieved 92.03% accuracy in glaucoma stage classification.

02

Adding ECA and ConvNeXtV2 blocks improved model performance.

03

MaxGlaViT outperformed state-of-the-art CNN and ViT models.

Abstract

Glaucoma is a prevalent eye disease that progresses silently without symptoms. If not detected and treated early, it can cause permanent vision loss. Computer-assisted diagnosis systems play a crucial role in timely and efficient identification. This study introduces MaxGlaViT, a lightweight model based on the restructured Multi-Axis Vision Transformer (MaxViT) for early glaucoma detection. First, MaxViT was scaled to optimize block and channel numbers, resulting in a lighter architecture. Second, the stem was enhanced by adding attention mechanisms (CBAM, ECA, SE) after convolution layers to improve feature learning. Third, MBConv structures in MaxViT blocks were replaced by advanced DL blocks (ConvNeXt, ConvNeXtV2, InceptionNeXt). The model was evaluated using the HDV1 dataset, containing fundus images of different glaucoma stages. Additionally, 40 CNN and 40 ViT models were tested on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ymyurdakul/MaxGlaViT
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer