Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano
Berk Yilmaz, Aniruddh Aiyengar

TL;DR
This paper presents a novel cross-architecture knowledge distillation framework to develop a lightweight, accurate retinal disease classifier suitable for deployment on resource-constrained devices like NVIDIA Jetson Nano, aiding diagnosis in low-resource settings.
Contribution
It introduces a new framework combining PCA and GL projectors with multi-view training to effectively compress a ViT model into a CNN while retaining high diagnostic accuracy.
Findings
The student CNN achieves 89% classification accuracy.
The model retains approximately 93% of the teacher's diagnostic performance.
The approach enables scalable AI triage in low-resource environments.
Abstract
Early and accurate identification of retinal ailments is crucial for averting ocular decline; however, access to dependable diagnostic devices is not often available in low-resourced settings. This project proposes to solve that by developing a lightweight, edge-device deployable disease classifier using cross-architecture knowledge distilling. We first train a high-capacity vision transformer (ViT) teacher model, pre-trained using I-JEPA self-supervised learning, to classify fundus images into four classes: Normal, Diabetic Retinopathy, Glaucoma, and Cataract. We kept an Internet of Things (IoT) focus when compressing to a CNN-based student model for deployment in resource-limited conditions, such as the NVIDIA Jetson Nano. This was accomplished using a novel framework which included a Partitioned Cross-Attention (PCA) projector, a Group-Wise Linear (GL) projector, and a multi-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
