Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Abir Harrasse, Florent Draye, Punya Syon Pandey, Zhijing Jin, Bernhard Sch\"olkopf

TL;DR
This paper investigates how multilingual LLMs internally represent multiple languages, revealing shared features and language-specific decoding mechanisms, and explains performance gaps through detailed mechanistic analysis.
Contribution
It introduces Cross-Layer Transcoders and attribution graphs to analyze multilingual representations, uncovering shared features and language-specific decoding in LLMs.
Findings
Multilingual shared representations involve similar features across languages.
Language-specific decoding emerges in later layers of the model.
Underperformance in non-English languages is linked to weak late-layer features and tokenizer bias.
Abstract
Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does performance favor the dominant training language? To address this, we train models on different multilingual mixtures and analyze their internal mechanisms using Cross-Layer Transcoders (CLTs) and Attribution Graphs. Our results reveal multilingual shared representations: the model employs highly similar features across languages, while language-specific decoding emerges in later layers. Training models without English shows identical multilingual shared space structures. Decoding relies partly on a small set of high-frequency features in the final layers, which linearly encode language identity from early layers. Intervening on these features allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
