ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised   Pretraining Networks for Retinal OCT Classification

Mohammadreza Saraei; Igor Kozak; Eung-Joo Lee

arXiv:2501.17260·cs.CV·January 30, 2025

ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification

Mohammadreza Saraei, Igor Kozak, Eung-Joo Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViT-2SPN, a self-supervised pretraining framework using Vision Transformers for improved retinal OCT classification, addressing data scarcity and privacy issues in medical imaging.

Contribution

The paper presents a novel dual-stream self-supervised pretraining approach with a Vision Transformer backbone, enhancing feature extraction for OCT diagnosis.

Findings

01

Achieved a mean AUC of 0.93 on OCT classification

02

Outperformed existing self-supervised methods in accuracy and F1 score

03

Demonstrated effectiveness with limited labeled data

Abstract

Optical Coherence Tomography (OCT) is a non-invasive imaging modality essential for diagnosing various eye diseases. Despite its clinical significance, developing OCT-based diagnostic tools faces challenges, such as limited public datasets, sparse annotations, and privacy concerns. Although deep learning has made progress in automating OCT analysis, these challenges remain unresolved. To address these limitations, we introduce the Vision Transformer-based Dual-Stream Self-Supervised Pretraining Network (ViT-2SPN), a novel framework designed to enhance feature extraction and improve diagnostic accuracy. ViT-2SPN employs a three-stage workflow: Supervised Pretraining, Self-Supervised Pretraining (SSP), and Supervised Fine-Tuning. The pretraining phase leverages the OCTMNIST dataset (97,477 unlabeled images across four disease classes) with data augmentation to create dual-augmented views.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mrsaraei/vit-2spn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Brain Tumor Detection and Classification

MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Vision Transformer · Multi-Head Attention