Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data
Sahar Almahfouz Nasser, Nihar Gupte, and Amit Sethi

TL;DR
This paper introduces reverse knowledge distillation, training a large vision transformer model using a smaller CNN model, to improve retinal image matching with limited data and prevent overfitting.
Contribution
It proposes a novel reverse knowledge distillation approach, architectural improvements to SuperRetina, and provides a new annotated dataset for retinal keypoint detection.
Findings
Reverse knowledge distillation enhances model generalization.
High-dimensional representation fitting helps prevent overfitting.
The approach outperforms traditional training methods on retinal matching tasks.
Abstract
Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Brain Tumor Detection and Classification · Retinal Diseases and Treatments
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Dense Connections · Knowledge Distillation · Vision Transformer
