Decomposing Query-Key Feature Interactions Using Contrastive Covariances

Andrew Lee; Yonatan Belinkov; Fernanda Vi\'egas; Martin Wattenberg

arXiv:2602.04752·cs.LG·February 5, 2026

Decomposing Query-Key Feature Interactions Using Contrastive Covariances

Andrew Lee, Yonatan Belinkov, Fernanda Vi\'egas, Martin Wattenberg

PDF

Open Access

TL;DR

This paper introduces a contrastive covariance method to decompose the query-key space in Transformers, enabling interpretability of attention mechanisms by identifying human-understandable feature interactions.

Contribution

The paper proposes a novel contrastive covariance approach to analyze and interpret query-key interactions in large language models, revealing low-rank, human-interpretable components.

Findings

01

Identified interpretable query-key subspaces for semantic features

02

Demonstrated attribution of attention scores to specific features

03

Validated the method analytically and empirically in simplified and large models

Abstract

Despite the central role of attention heads in Transformers, we lack tools to understand why a model attends to a particular token. To address this, we study the query-key (QK) space -- the bilinear joint embedding space between queries and keys. We present a contrastive covariance method to decompose the QK space into low-rank, human-interpretable components. It is when features in keys and queries align in these low-rank subspaces that high attention scores are produced. We first study our method both analytically and empirically in a simplified setting. We then apply our method to large language models to identify human-interpretable QK subspaces for categorical semantic features and binding features. Finally, we demonstrate how attention scores can be attributed to our identified features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Advanced Graph Neural Networks · Topic Modeling