PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis
Marzieh Oghbaie, Teresa Ara\'ujo, Hrvoje Bogunovi\'c

TL;DR
PiPViT is an interpretable vision transformer model that learns human-understandable prototypes for retinal image analysis, improving interpretability and localization of biomarkers while maintaining competitive accuracy.
Contribution
Introduces PiPViT, a novel patch-based, interpretable prototype model using ViT for retinal imaging, addressing granularity and visualization issues of prior methods.
Findings
Achieved competitive accuracy on retinal OCT datasets.
Produced semantically meaningful and clinically relevant prototypes.
Enhanced biomarker localization across multiple scales.
Abstract
Background and Objective: Prototype-based methods improve interpretability by learning fine-grained part-prototypes; however, their visualization in the input pixel space is not always consistent with human-understandable biomarkers. In addition, well-known prototype-based approaches typically learn extremely granular prototypes that are less interpretable in medical imaging, where both the presence and extent of biomarkers and lesions are critical. Methods: To address these challenges, we propose PiPViT (Patch-based Visual Interpretable Prototypes), an inherently interpretable prototypical model for image recognition. Leveraging a vision transformer (ViT), PiPViT captures long-range dependencies among patches to learn robust, human-interpretable prototypes that approximate lesion extent only using image-level labels. Additionally, PiPViT benefits from contrastive learning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Linear Layer · Dense Connections · Attention Is All You Need · Multi-Head Attention · Contrastive Learning · Layer Normalization · Vision Transformer · Sparse Evolutionary Training
