Paying Attention to Facts: Quantifying the Knowledge Capacity of   Attention Layers

Liang Ze Wong

arXiv:2502.05076·cs.LG·February 10, 2025

Paying Attention to Facts: Quantifying the Knowledge Capacity of Attention Layers

Liang Ze Wong

PDF

Open Access

TL;DR

This paper analyzes the capacity of attention layers in transformers to memorize facts by using a tensor rank measure, providing bounds and empirical insights into how design choices affect factual recall.

Contribution

It introduces a tensor-based framework to quantify the knowledge capacity of attention layers and explores how their design influences memorization ability.

Findings

01

Tensor rank correlates with database size and memorization capacity.

02

Value-output, query-key weights, and activation functions impact rank and capacity.

03

Insights suggest ways to increase layer capacity without adding parameters.

Abstract

In this paper, we investigate the ability of single-layer attention-only transformers (i.e. attention layers) to memorize facts contained in databases from a linear-algebraic perspective. We associate with each database a 3-tensor, propose the rank of this tensor as a measure of the size of the database, and provide bounds on the rank in terms of properties of the database. We also define a 3-tensor corresponding to an attention layer, and empirically demonstrate the relationship between its rank and database rank on a dataset of toy models and random databases. By highlighting the roles played by the value-output and query-key weights, and the effects of argmax and softmax on rank, our results shed light on the `additive motif' of factual recall in transformers, while also suggesting a way of increasing layer capacity without increasing the number of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence

MethodsAttention Is All You Need · Softmax