Leveraging Complementary Attention maps in vision transformers for OCT image analysis

Haz Sameen Shahgir; Tanjeem Azwad Zaman; Khondker Salman Sayeed; Md. Asif Haider; Sheikh Saifur Rahman Jony; M. Sohel Rahman

arXiv:2310.14005·eess.IV·June 3, 2025·1 cites

Leveraging Complementary Attention maps in vision transformers for OCT image analysis

Haz Sameen Shahgir, Tanjeem Azwad Zaman, Khondker Salman Sayeed, Md. Asif Haider, Sheikh Saifur Rahman Jony, M. Sohel Rahman

PDF

Open Access

TL;DR

This paper presents a novel pipeline combining hybrid and pure attention vision transformers for OCT biomarker detection, achieving state-of-the-art results and efficient single-model performance through knowledge distillation.

Contribution

It introduces a systematic evaluation of convolution and attention mechanisms in vision transformers for OCT analysis and demonstrates the effectiveness of ensembling and distillation techniques.

Findings

01

MaxViT excels at local feature detection

02

EVA-02 captures global features effectively

03

Ensembling improves biomarker detection accuracy

Abstract

Optical Coherence Tomography (OCT) scan yields all possible cross-section images of a retina for detecting biomarkers linked to optical defects. Due to the high volume of data generated, an automated and reliable biomarker detection pipeline is necessary as a primary screening stage. We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans. In collaboration with trained ophthalmologists, we identify local and global structures in biomarkers. Through a comprehensive and systematic review of existing vision architectures, we evaluate different convolution and attention mechanisms for biomarker detection. We find that MaxViT, a hybrid vision transformer combining convolution layers with strided attention, is better suited for local feature detection, while EVA-02, a standard vision transformer leveraging pure attention and large-scale knowledge distillation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Optical Coherence Tomography Applications · AI in cancer detection

MethodsSemi-Pseudo-Label · Knowledge Distillation · Convolution