Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano

Berk Yilmaz; Aniruddh Aiyengar

arXiv:2506.18220·cs.CV·June 24, 2025

Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano

Berk Yilmaz, Aniruddh Aiyengar

PDF

TL;DR

This paper presents a novel cross-architecture knowledge distillation framework to develop a lightweight, accurate retinal disease classifier suitable for deployment on resource-constrained devices like NVIDIA Jetson Nano, aiding diagnosis in low-resource settings.

Contribution

It introduces a new framework combining PCA and GL projectors with multi-view training to effectively compress a ViT model into a CNN while retaining high diagnostic accuracy.

Findings

01

The student CNN achieves 89% classification accuracy.

02

The model retains approximately 93% of the teacher's diagnostic performance.

03

The approach enables scalable AI triage in low-resource environments.

Abstract

Early and accurate identification of retinal ailments is crucial for averting ocular decline; however, access to dependable diagnostic devices is not often available in low-resourced settings. This project proposes to solve that by developing a lightweight, edge-device deployable disease classifier using cross-architecture knowledge distilling. We first train a high-capacity vision transformer (ViT) teacher model, pre-trained using I-JEPA self-supervised learning, to classify fundus images into four classes: Normal, Diabetic Retinopathy, Glaucoma, and Cataract. We kept an Internet of Things (IoT) focus when compressing to a CNN-based student model for deployment in resource-limited conditions, such as the NVIDIA Jetson Nano. This was accomplished using a novel framework which included a Partitioned Cross-Attention (PCA) projector, a Group-Wise Linear (GL) projector, and a multi-view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.