Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages

Lyzander Marciano Andrylie; Inaya Rahmanisa; Mahardika Krisna Ihsani; Alfan Farizki Wicaksono; Haryo Akbarianto Wibowo; Alham Fikri Aji

arXiv:2507.11230·cs.CL·February 24, 2026

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages

Lyzander Marciano Andrylie, Inaya Rahmanisa, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji

PDF

Open Access

TL;DR

This paper demonstrates that sparse autoencoders can identify language-specific features in large language models, revealing their influence on multilingual performance and enabling interpretable language identification.

Contribution

Introduces SAE-LAPE, a novel method to detect language-specific features in LLMs using feature activation probability, enhancing interpretability and understanding of multilingual mechanisms.

Findings

01

Many language-specific features appear in middle to final layers.

02

Features are interpretable and influence multilingual performance.

03

Language identification accuracy is comparable to fastText.

Abstract

Understanding the multilingual mechanisms of large language models (LLMs) provides insight into how they process different languages, yet this remains challenging. Existing studies often focus on individual neurons, but their polysemantic nature makes it difficult to isolate language-specific units from cross-lingual representations. To address this, we explore sparse autoencoders (SAEs) for their ability to learn monosemantic features that represent concrete and abstract concepts across languages in LLMs. While some of these features are language-independent, the presence of language-specific features remains underexplored. In this work, we introduce SAE-LAPE, a method based on feature activation probability, to identify language-specific features within the feed-forward network. We find that many such features predominantly appear in the middle to final layers of the model and are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques