From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
Francesco Gentile, Nicola Dall'Asen, Francesco Tonini, Massimiliano Mancini, Lorenzo Vaquero, Elisa Ricci

TL;DR
This paper introduces SITH, a data-free framework for interpreting CLIP's vision transformer by decomposing attention heads into semantically meaningful concepts, enabling faithful explanations and model edits without retraining.
Contribution
SITH is the first fully data-free, training-free method to interpret CLIP's weights via singular vector decomposition and concept-based explanations, facilitating precise model editing.
Findings
SITH provides coherent, faithful intra-head explanations.
Model fine-tuning reweights existing semantic bases rather than learning new features.
SITH enables concept-based model edits that improve downstream performance.
Abstract
As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability methods predominantly rely on activations, making them dataset-dependent, vulnerable to data bias, and often restricted to coarse head-level explanations. We introduce SITH (Semantic Inspection of Transformer Heads), a fully data-free, training-free framework that directly analyzes CLIP's vision transformer in weight space. For each attention head, we decompose its value-output matrix into singular vectors and interpret each one via COMP (Coherent Orthogonal Matching Pursuit), a new algorithm that explains them as sparse, semantically coherent combinations of human-interpretable concepts. We show that SITH yields coherent, faithful intra-head explanations, validated through reconstruction fidelity and interpretability experiments. This allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
