Explaining How Transformers Use Context to Build Predictions
Javier Ferrando, Gerard I. G\'allego, Ioannis Tsiamas, Marta R., Costa-juss\`a

TL;DR
This paper introduces a novel explainability method for Transformer-based language models, revealing how context influences predictions across layers, and demonstrates its effectiveness in linguistic and translation tasks.
Contribution
It presents a new procedure for analyzing Transformer models using contrastive examples, improving explanation alignment, and exploring the role of MLPs and translation alignments.
Findings
Our method outperforms gradient and perturbation baselines in explanation alignment.
MLPs learn features that support grammatical predictions.
Neural Machine Translation models exhibit human-like source-target alignments.
Abstract
Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout the layers. In this work, we leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation. Using contrastive examples, we compare the alignment of our explanations with evidence of the linguistic phenomena, and show that our method consistently aligns better than gradient-based and perturbation-based baselines. Then, we investigate the role of MLPs inside the Transformer and show that they learn features that help the model predict words that are grammatically acceptable. Lastly, we apply our method to Neural Machine Translation models, and demonstrate that they generate human-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer · Adam
