PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Marzieh Oghbaie; Teresa Ara\'ujo; Hrvoje Bogunovi\'c

arXiv:2506.10669·cs.CV·June 16, 2025

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Marzieh Oghbaie, Teresa Ara\'ujo, Hrvoje Bogunovi\'c

PDF

Open Access 1 Repo

TL;DR

PiPViT is an interpretable vision transformer model that learns human-understandable prototypes for retinal image analysis, improving interpretability and localization of biomarkers while maintaining competitive accuracy.

Contribution

Introduces PiPViT, a novel patch-based, interpretable prototype model using ViT for retinal imaging, addressing granularity and visualization issues of prior methods.

Findings

01

Achieved competitive accuracy on retinal OCT datasets.

02

Produced semantically meaningful and clinically relevant prototypes.

03

Enhanced biomarker localization across multiple scales.

Abstract

Background and Objective: Prototype-based methods improve interpretability by learning fine-grained part-prototypes; however, their visualization in the input pixel space is not always consistent with human-understandable biomarkers. In addition, well-known prototype-based approaches typically learn extremely granular prototypes that are less interpretable in medical imaging, where both the presence and extent of biomarkers and lesions are critical. Methods: To address these challenges, we propose PiPViT (Patch-based Visual Interpretable Prototypes), an inherently interpretable prototypical model for image recognition. Leveraging a vision transformer (ViT), PiPViT captures long-range dependencies among patches to learn robust, human-interpretable prototypes that approximate lesion extent only using image-level labels. Additionally, PiPViT benefits from contrastive learning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marziehoghbaie/pipvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Linear Layer · Dense Connections · Attention Is All You Need · Multi-Head Attention · Contrastive Learning · Layer Normalization · Vision Transformer · Sparse Evolutionary Training