ComFe: An Interpretable Head for Vision Transformers

Evelyn J. Mannix; Liam Hodgkinson; Howard Bondell

arXiv:2403.04125·cs.CV·November 18, 2025·1 cites

ComFe: An Interpretable Head for Vision Transformers

Evelyn J. Mannix, Liam Hodgkinson, Howard Bondell

PDF

Open Access 1 Repo

TL;DR

ComFe introduces a scalable, interpretable classification head for Vision Transformers that maintains competitive accuracy, enhances robustness, and identifies meaningful component features without additional annotations or extensive hyperparameter tuning.

Contribution

It presents ComFe, the first interpretable head for large-scale Vision Transformers, enabling interpretability and robustness without finetuning the backbone or needing extra annotations.

Findings

01

Achieves competitive performance on ImageNet-1K

02

Provides improved robustness over previous methods

03

Identifies consistent component features within images

Abstract

Interpretable computer vision models explain their classifications through comparing the distances between the local embeddings of an image and a set of prototypes that represent the training data. However, these approaches introduce additional hyper-parameters that need to be tuned to apply to new datasets, scale poorly, and are more computationally intensive to train in comparison to black-box approaches. In this work, we introduce Component Features (ComFe), a highly scalable interpretable-by-design image classification head for pretrained Vision Transformers (ViTs) that can obtain competitive performance in comparison to comparable non-interpretable methods. To our knowledge, ComFe is the first interpretable head and unlike other interpretable approaches can be readily applied to large-scale datasets such as ImageNet-1K. Additionally, ComFe provides improved robustness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emannix/automating-the-assessment-of-biofouling
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods