Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages
Lyzander Marciano Andrylie, Inaya Rahmanisa, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji

TL;DR
This paper demonstrates that sparse autoencoders can identify language-specific features in large language models, revealing their influence on multilingual performance and enabling interpretable language identification.
Contribution
Introduces SAE-LAPE, a novel method to detect language-specific features in LLMs using feature activation probability, enhancing interpretability and understanding of multilingual mechanisms.
Findings
Many language-specific features appear in middle to final layers.
Features are interpretable and influence multilingual performance.
Language identification accuracy is comparable to fastText.
Abstract
Understanding the multilingual mechanisms of large language models (LLMs) provides insight into how they process different languages, yet this remains challenging. Existing studies often focus on individual neurons, but their polysemantic nature makes it difficult to isolate language-specific units from cross-lingual representations. To address this, we explore sparse autoencoders (SAEs) for their ability to learn monosemantic features that represent concrete and abstract concepts across languages in LLMs. While some of these features are language-independent, the presence of language-specific features remains underexplored. In this work, we introduce SAE-LAPE, a method based on feature activation probability, to identify language-specific features within the feed-forward network. We find that many such features predominantly appear in the middle to final layers of the model and are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
