Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
Javier Ferrando, Gerard I. G\'allego, Belen Alastruey, Carlos, Escolano, Marta R. Costa-juss\`a

TL;DR
This paper introduces a new interpretability method for neural machine translation models that attributes influence to both source and target tokens, enhancing understanding of model decisions in Transformer-based NMT systems.
Contribution
The paper presents a novel attribution method applicable to encoder-decoder Transformers, enabling comprehensive analysis of source and target token influences in NMT.
Findings
Insights into how source tokens influence translation decisions
Understanding of target prefix effects on model predictions
Applicability to bilingual and multilingual Transformers
Abstract
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks input tokens' attributions for both contexts. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Day 15: Open NLLB - ALTI+, detecting hallucinations (Pt 2)· youtube
Day 15: Open NLLB - sharding, paper reading (hallucinations) (Pt 1)· youtube
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
