How and where does CLIP process negation?

Vincent Quantmeyer; Pablo Mosteiro; Albert Gatt

arXiv:2407.10488·cs.CL·July 16, 2024

How and where does CLIP process negation?

Vincent Quantmeyer, Pablo Mosteiro, Albert Gatt

PDF

Open Access

TL;DR

This paper investigates how CLIP, a vision-language model, processes negation by analyzing its internal mechanisms, revealing both the model's interpretability and limitations of current benchmarks for linguistic understanding.

Contribution

It introduces interpretability methods to analyze CLIP's negation processing, providing insights into its internal workings and exposing limitations of existing benchmarks.

Findings

01

Identifies specific parts of CLIP's text encoder involved in negation processing

02

Shows how attention heads contribute to understanding negation in CLIP

03

Highlights limitations of the VALSE dataset for testing linguistic understanding

Abstract

Various benchmarks have been proposed to test linguistic understanding in pre-trained vision \& language (VL) models. Here we build on the existence task from the VALSE benchmark (Parcalabescu et al, 2022) which we use to test models' understanding of negation, a particularly interesting issue for multimodal models. However, while such VL benchmarks are useful for measuring model performance, they do not reveal anything about the internal processes through which these models arrive at their outputs in such visio-linguistic tasks. We take inspiration from the growing literature on model interpretability to explain the behaviour of VL models on the understanding of negation. Specifically, we approach these questions through an in-depth analysis of the text encoder in CLIP (Radford et al, 2021), a highly influential VL model. We localise parts of the encoder that process negation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLinguistic Studies and Language Acquisition

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training