ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

Mengqi Xue; Qihan Huang; Haofei Zhang; Jingwen Hu; Jie Song; Mingli Song; Canghong Jin

arXiv:2208.10431·cs.CV·November 27, 2025·22 cites

ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

Mengqi Xue, Qihan Huang, Haofei Zhang, Jingwen Hu, Jie Song, Mingli Song, Canghong Jin

PDF

Open Access 1 Repo

TL;DR

ProtoPFormer enhances vision transformer-based image recognition by integrating global and local prototypes, improving interpretability and accuracy through focused attention on foreground features and prototypical parts.

Contribution

This paper introduces ProtoPFormer, a novel method that effectively applies prototype-based interpretability to ViTs by using global and local prototypes for better focus and explanation.

Findings

01

Outperforms state-of-the-art prototype-based methods in accuracy.

02

Provides more faithful and transparent decision explanations.

03

Achieves superior visualization of prototypical parts.

Abstract

Prototypical part network (ProtoPNet) has drawn wide attention and boosted many follow-up studies due to its self-explanatory property for explainable artificial intelligence (XAI). However, when directly applying ProtoPNet on vision transformer (ViT) backbones, learned prototypes have a "distraction" problem: they have a relatively high probability of being activated by the background and pay less attention to the foreground. The powerful capability of modeling long-term dependency makes the transformer-based ProtoPNet hard to focus on prototypical parts, thus severely impairing its inherent interpretability. This paper proposes prototypical part transformer (ProtoPFormer) for appropriately and effectively applying the prototype-based method with ViTs for interpretable image recognition. The proposed method introduces global and local prototypes for capturing and highlighting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-vipa/protopformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Explainable Artificial Intelligence (XAI) · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Layer Normalization · Dense Connections · Vision Transformer