Decompose the model: Mechanistic interpretability in image models with   Generalized Integrated Gradients (GIG)

Yearim Kim; Sangyu Han; Sangbum Han; Nojun Kwak

arXiv:2409.01610·cs.CV·September 4, 2024

Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)

Yearim Kim, Sangyu Han, Sangbum Han, Nojun Kwak

PDF

Open Access

TL;DR

This paper introduces a novel method for mechanistic interpretability in image models, using concept vectors and generalized integrated gradients to analyze and understand model operations across datasets.

Contribution

It proposes a systematic approach combining Pointwise Feature Vectors, Effective Receptive Fields, and Generalized Integrated Gradients for dataset-wide interpretability of image models.

Findings

01

Effective concept extraction and attribution demonstrated

02

Provides a holistic view of model mechanics

03

Enables dataset-wide analysis of model behavior

Abstract

In the field of eXplainable AI (XAI) in language models, the progression from local explanations of individual decisions to global explanations with high-level concepts has laid the groundwork for mechanistic interpretability, which aims to decode the exact operations. However, this paradigm has not been adequately explored in image models, where existing methods have primarily focused on class-specific interpretations. This paper introduces a novel approach to systematically trace the entire pathway from input through all intermediate layers to the final output within the whole dataset. We utilize Pointwise Feature Vectors (PFVs) and Effective Receptive Fields (ERFs) to decompose model embeddings into interpretable Concept Vectors. Then, we calculate the relevance between concept vectors with our Generalized Integrated Gradients (GIG), enabling a comprehensive, dataset-wide analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques