On Identifiability in Transformers
Gino Brunner, Yang Liu, Dami\'an Pascual, Oliver Richter, Massimiliano, Ciaramita, Roger Wattenhofer

TL;DR
This paper investigates the identifiability of attention weights and token embeddings in Transformers, revealing limitations in interpretability and proposing methods to better understand how information is encoded and mixed within the model.
Contribution
It provides new insights into the non-identifiability of attention weights, the encoding of token identity, and introduces tools for analyzing contextual embedding mixing in Transformers.
Findings
Attention weights are not identifiable for sequences longer than attention head dimension.
Token identities are largely preserved across layers, mainly encoded in embedding angles.
Contextual embeddings show strong mixing of input information, quantified through gradient attribution.
Abstract
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
