On Identifiability in Transformers

Gino Brunner; Yang Liu; Dami\'an Pascual; Oliver Richter; Massimiliano; Ciaramita; Roger Wattenhofer

arXiv:1908.04211·cs.CL·February 10, 2020·76 cites

On Identifiability in Transformers

Gino Brunner, Yang Liu, Dami\'an Pascual, Oliver Richter, Massimiliano, Ciaramita, Roger Wattenhofer

PDF

Open Access

TL;DR

This paper investigates the identifiability of attention weights and token embeddings in Transformers, revealing limitations in interpretability and proposing methods to better understand how information is encoded and mixed within the model.

Contribution

It provides new insights into the non-identifiability of attention weights, the encoding of token identity, and introduces tools for analyzing contextual embedding mixing in Transformers.

Findings

01

Attention weights are not identifiable for sequences longer than attention head dimension.

02

Token identities are largely preserved across layers, mainly encoded in embedding angles.

03

Contextual embeddings show strong mixing of input information, quantified through gradient attribution.

Abstract

In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax