When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training
Felicia K\"orner, Max M\"uller-Eberstein, Anna Korhonen, Barbara Plank

TL;DR
This paper investigates how shared, language-agnostic concept spaces develop during multilingual language model training, revealing early emergence, ongoing refinement, and the nuanced relationship between alignment and translation quality.
Contribution
It introduces a causal interpretability approach to analyze the emergence of cross-lingual concept spaces during training, providing new insights into training dynamics and model behavior.
Findings
Shared concept spaces emerge early and refine over time
Alignment with these spaces is language-dependent
Translation improvements often reflect behavioral shifts, not true translation ability
Abstract
Training Large Language Models (LLMs) with high multilingual coverage is becoming increasingly important -- especially when monolingual resources are scarce. Recent studies have found that LLMs process multilingual inputs in shared concept spaces, thought to support generalization and cross-lingual transfer. However, these prior studies often do not use causal methods, lack deeper error analysis or focus on the final model only, leaving open how these spaces emerge during training. We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM through the causal interpretability method of activation patching. We isolate cross-lingual concept representations, then inject them into a translation prompt to investigate how consistently translations can be altered, independently of the language. We find that shared concept spaces emerge early} and continue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
