Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation

Aria Nourbakhsh; Salima Lamsiyah; Adelaide Danilov; Christoph Schommer

arXiv:2603.11342·cs.CL·March 13, 2026

Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation

Aria Nourbakhsh, Salima Lamsiyah, Adelaide Danilov, Christoph Schommer

PDF

Open Access

TL;DR

This paper evaluates various explainability methods in neural machine translation models by using attribution maps to guide a student model, revealing which methods best capture source-target alignments and improve translation quality.

Contribution

Introduces a systematic evaluation framework for attribution methods in seq2seq models using knowledge distillation and proposes an Attributor transformer to reconstruct attribution maps.

Findings

01

Attention, Value Zeroing, and Layer Gradient × Activation outperform other methods in BLEU gains.

02

Attribution methods capturing alignment signals lead to better translation improvements.

03

The Attributor model's accuracy correlates with the usefulness of attribution injection.

Abstract

The study of the attribution of input features to the output of neural network models is an active area of research. While numerous Explainable AI (XAI) techniques have been proposed to interpret these models, the systematic and automated evaluation of these methods in sequence-to-sequence (seq2seq) models is less explored. This paper introduces a new approach for evaluating explainability methods in transformer-based seq2seq models. We use teacher-derived attribution maps as a structured side signal to guide a student model, and quantify the utility of different attribution methods through the student's ability to simulate targets. Using the Inseq library, we extract attribution scores over source-target sequence pairs and inject these scores into the attention mechanism of a student transformer model under four composition operators (addition, multiplication, averaging, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications