Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Boyi Deng; Yu Wan; Yidan Zhang; Baosong Yang; Fuli Feng

arXiv:2505.05111·cs.CL·May 28, 2025

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Boyi Deng, Yu Wan, Yidan Zhang, Baosong Yang, Fuli Feng

PDF

Open Access 1 Video

TL;DR

This paper uses Sparse Autoencoders to analyze and identify language-specific features in Large Language Models, revealing their role in multilingual capabilities and enabling improved language control.

Contribution

It introduces a novel SAE-based method and metric to identify language-specific features, demonstrating their impact on multilingual abilities and steering control in LLMs.

Findings

01

Some SAE features are strongly language-specific.

02

Ablating these features affects only certain languages.

03

Combining features enhances language control.

Abstract

The mechanisms behind multilingual capabilities in Large Language Models (LLMs) have been examined using neuron-based or internal-activation-based methods. However, these methods often face challenges such as superposition and layer-wise activation variance, which limit their reliability. Sparse Autoencoders (SAEs) offer a more nuanced analysis by decomposing the activations of LLMs into a sparse linear combination of SAE features. We introduce a novel metric to assess the monolinguality of features obtained from SAEs, discovering that some features are strongly related to specific languages. Additionally, we show that ablating these SAE features only significantly reduces abilities in one language of LLMs, leaving others almost unaffected. Interestingly, we find some languages have multiple synergistic SAE features, and ablating them together yields greater improvement than ablating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling