A Simple Interpretable Transformer for Fine-Grained Image Classification   and Analysis

Dipanjyoti Paul; Arpita Chowdhury; Xinqi Xiong; Feng-Ju Chang; David; Carlyn; Samuel Stevens; Kaiya L. Provost; Anuj Karpatne; Bryan Carstens,; Daniel Rubenstein; Charles Stewart; Tanya Berger-Wolf; Yu Su; Wei-Lun Chao

arXiv:2311.04157·cs.CV·June 17, 2024·5 cites

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David, Carlyn, Samuel Stevens, Kaiya L. Provost, Anuj Karpatne, Bryan Carstens,, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces INTR, an interpretable Transformer-based model that proactively searches for class-specific patterns in images, enhancing fine-grained classification interpretability and attribute identification.

Contribution

The paper proposes INTR, a novel Transformer architecture using class-specific queries for interpretable, fine-grained image classification and analysis, inspired by DETR.

Findings

01

INTR provides faithful interpretation through cross-attention weights.

02

It effectively identifies class attributes via multi-head attention.

03

Demonstrates superior interpretability on eight datasets.

Abstract

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imageomics/intr
pytorchOfficial

Models

🤗
imageomics/INTR
model

Videos

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis· slideslive

Taxonomy

TopicsCell Image Analysis Techniques · Digital Imaging for Blood Diseases · AI in cancer detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Byte Pair Encoding · Dropout · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Softmax · Dense Connections