A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David, Carlyn, Samuel Stevens, Kaiya L. Provost, Anuj Karpatne, Bryan Carstens,, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao

TL;DR
This paper introduces INTR, an interpretable Transformer-based model that proactively searches for class-specific patterns in images, enhancing fine-grained classification interpretability and attribute identification.
Contribution
The paper proposes INTR, a novel Transformer architecture using class-specific queries for interpretable, fine-grained image classification and analysis, inspired by DETR.
Findings
INTR provides faithful interpretation through cross-attention weights.
It effectively identifies class attributes via multi-head attention.
Demonstrates superior interpretability on eight datasets.
Abstract
We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCell Image Analysis Techniques · Digital Imaging for Blood Diseases · AI in cancer detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Byte Pair Encoding · Dropout · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Softmax · Dense Connections
