Explaining How Transformers Use Context to Build Predictions

Javier Ferrando; Gerard I. G\'allego; Ioannis Tsiamas; Marta R.; Costa-juss\`a

arXiv:2305.12535·cs.CL·May 23, 2023·1 cites

Explaining How Transformers Use Context to Build Predictions

Javier Ferrando, Gerard I. G\'allego, Ioannis Tsiamas, Marta R., Costa-juss\`a

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel explainability method for Transformer-based language models, revealing how context influences predictions across layers, and demonstrates its effectiveness in linguistic and translation tasks.

Contribution

It presents a new procedure for analyzing Transformer models using contrastive examples, improving explanation alignment, and exploring the role of MLPs and translation alignments.

Findings

01

Our method outperforms gradient and perturbation baselines in explanation alignment.

02

MLPs learn features that support grammatical predictions.

03

Neural Machine Translation models exhibit human-like source-target alignments.

Abstract

Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout the layers. In this work, we leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation. Using contrastive examples, we compare the alignment of our explanations with evidence of the linguistic phenomena, and show that our method consistently aligns better than gradient-based and perturbation-based baselines. Then, we investigate the role of MLPs inside the Transformer and show that they learn features that help the model predict words that are grammatically acceptable. Lastly, we apply our method to Neural Machine Translation models, and demonstrate that they generate human-like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-upc/logit-explanations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer · Adam