Interpreting the Second-Order Effects of Neurons in CLIP

Yossi Gandelsman; Alexei A. Efros; Jacob Steinhardt

arXiv:2406.04341·cs.CV·February 14, 2025

Interpreting the Second-Order Effects of Neurons in CLIP

Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

PDF

Open Access 3 Reviews

TL;DR

This paper introduces the 'second-order lens' to interpret CLIP neurons by analyzing effects through attention heads, revealing their selectivity and polysemantic nature, and applying this understanding to adversarial example generation and zero-shot segmentation.

Contribution

The paper presents a novel second-order analysis method for interpreting CLIP neurons, uncovering their selectivity and polysemy, and demonstrates practical applications in adversarial attacks and segmentation.

Findings

01

Neuron effects are highly selective, impacting less than 2% of images.

02

Each neuron effect can be approximated by a single text-image direction.

03

Neurons exhibit polysemantic behavior, representing multiple unrelated concepts.

Abstract

We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. We find that these effects are highly selective: for each neuron, the effect is significant for <2% of the images. Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. The sets reveal polysemantic behavior - each neuron corresponds to multiple, often unrelated, concepts (e.g. ships and cars). Exploiting this…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 4

Strengths

This paper draws inspiration from recent approaches that aim to examine and evaluate the functionality of each neuron in a given architecture. Automated interpretability constitutes an important challenge for modern architectures and this work aims to approach this in a different way via the contribution of neurons to the output representation and the information flow through the MSA blocks.

Weaknesses

The connection of the proposed approach to highly relevant work is a bit lacking. Can the authors provide a discussion on [1], highlighting the differences in the decomposition and analysis of the direct effects of the neurons? I find the focus on a single dataset, i.e., ImageNet, to be a bit restrictive in terms of analysing the behavior of the proposed approach. Indeed, most approaches in this line of work considered additional datasets, e.g., Waterbirds, CUB and Places365. The same applies f

Reviewer 02Rating 6Confidence 3

Strengths

1. The technical contributions are sound and interesting. 2. The paper is well written. 3. The paper included thorough evaluations.

Weaknesses

Generally good paper so please see questions.

Reviewer 03Rating 8Confidence 4

Strengths

- Extensive empirical validation of second order effects (e.g. second order effect neuron sparseness) - Intuitive and interesting applications of second order effect control in the semantic adversarial example generation - Increased understanding of internal attention model mechanism through semantic adversarial examples - Improved segmentation results over TextSpan

Weaknesses

- Sparse coding to find textual descriptions of neurons may be very computationally expensive - Not considering nonlinearities in second order effects (Eqn 5)

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeuroscience and Neuropharmacology Research

MethodsContrastive Language-Image Pre-training