Disentangling meaning from language in LLM-based machine translation

Th\'eo Lasnier; Armel Zebaze; Djam\'e Seddah; Rachel Bawden; Beno\^it Sagot

arXiv:2602.04613·cs.CL·February 5, 2026

Disentangling meaning from language in LLM-based machine translation

Th\'eo Lasnier, Armel Zebaze, Djam\'e Seddah, Rachel Bawden, Beno\^it Sagot

PDF

Open Access

TL;DR

This paper investigates how large language models internally handle machine translation at the sentence level by analyzing attention mechanisms, revealing specialized head functions and enabling targeted manipulation for improved translation control.

Contribution

It introduces a mechanistic interpretability approach to dissect sentence-level translation in LLMs, identifying specialized attention heads and demonstrating effective subtask steering with minimal modifications.

Findings

01

Distinct attention head sets specialize in translation subtasks

02

Modifying 1% of relevant heads achieves competitive instruction-free translation

03

Ablating specific heads disrupts their associated translation functions

Abstract

Mechanistic Interpretability (MI) seeks to explain how neural networks implement their capabilities, but the scale of Large Language Models (LLMs) has limited prior MI work in Machine Translation (MT) to word-level analyses. We study sentence-level MT from a mechanistic perspective by analyzing attention heads to understand how LLMs internally encode and distribute translation functions. We decompose MT into two subtasks: producing text in the target language (i.e. target language identification) and preserving the input sentence's meaning (i.e. sentence equivalence). Across three families of open-source models and 20 translation directions, we find that distinct, sparse sets of attention heads specialize in each subtask. Based on this insight, we construct subtask-specific steering vectors and show that modifying just 1% of the relevant heads enables instruction-free MT performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling