Disentangling meaning from language in LLM-based machine translation
Th\'eo Lasnier, Armel Zebaze, Djam\'e Seddah, Rachel Bawden, Beno\^it Sagot

TL;DR
This paper investigates how large language models internally handle machine translation at the sentence level by analyzing attention mechanisms, revealing specialized head functions and enabling targeted manipulation for improved translation control.
Contribution
It introduces a mechanistic interpretability approach to dissect sentence-level translation in LLMs, identifying specialized attention heads and demonstrating effective subtask steering with minimal modifications.
Findings
Distinct attention head sets specialize in translation subtasks
Modifying 1% of relevant heads achieves competitive instruction-free translation
Ablating specific heads disrupts their associated translation functions
Abstract
Mechanistic Interpretability (MI) seeks to explain how neural networks implement their capabilities, but the scale of Large Language Models (LLMs) has limited prior MI work in Machine Translation (MT) to word-level analyses. We study sentence-level MT from a mechanistic perspective by analyzing attention heads to understand how LLMs internally encode and distribute translation functions. We decompose MT into two subtasks: producing text in the target language (i.e. target language identification) and preserving the input sentence's meaning (i.e. sentence equivalence). Across three families of open-source models and 20 translation directions, we find that distinct, sparse sets of attention heads specialize in each subtask. Based on this insight, we construct subtask-specific steering vectors and show that modifying just 1% of the relevant heads enables instruction-free MT performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
