An Attention Infused Deep Learning System with Grad-CAM Visualization for Early Screening of Glaucoma
Ramanathan Swaminathan

TL;DR
This paper presents a novel deep learning system combining CNN and Vision Transformer with Cross-Attention for early glaucoma detection, demonstrating improved accuracy on ACRIMA and Drishti datasets.
Contribution
It introduces a fused CNN-ViT model with Cross-Attention, enhancing interpretability and performance in glaucoma screening.
Findings
Improved detection accuracy over baseline models
Effective visualization of clinically relevant regions
Successful integration of CNN and ViT for medical imaging
Abstract
This research work reveals the strengths of intertwining a deep custom convolutional neural network with a disruptive Vision Transformer, both fused together with a radical Cross-Attention module. Here, two high-yielding datasets for artificial intelligence models in detecting glaucoma, namely ACRIMA and Drishti, are utilized. The Cross-Attention mechanism facilitates the model in learning regions in the fundus that are clinically relevant through bidirectional feature exchange between CNN and ViT streams. Experiments clearly depict improved performance when compared to standalone baseline CNN and ViT models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
