LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering

Sing Hieng Wong; Hassan Sajjad; A.B. Siddique

arXiv:2604.03532·cs.CL·April 7, 2026

LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering

Sing Hieng Wong, Hassan Sajjad, A.B. Siddique

PDF

1 Repo

TL;DR

LangFIR introduces a novel method to identify sparse, language-specific features in multilingual language models using only monolingual data and random-token filtering, enabling effective language control.

Contribution

This work presents LangFIR, a new approach that discovers language-specific features without requiring multilingual or parallel data, outperforming existing methods in language steering accuracy.

Findings

01

LangFIR finds highly sparse, language-specific features in residual streams.

02

Directional ablation of these features increases cross-entropy loss for the target language.

03

LangFIR achieves superior BLEU scores across multiple models, datasets, and languages.

Abstract

Large language models (LLMs) show strong multilingual capabilities, yet reliably controlling the language of their outputs remains difficult. Representation-level steering addresses this by adding language-specific vectors to model activations at inference time, but identifying language-specific directions in the residual stream often relies on multilingual or parallel data that can be expensive to obtain. Sparse autoencoders (SAEs) decompose residual activations into interpretable, sparse feature directions and offer a natural basis for this search, yet existing SAE-based approaches face the same data constraint. We introduce LangFIR (Language Feature Identification via Random-token Filtering), a method that discovers language-specific SAE features using only a small amount of monolingual data and random-token sequences. Many SAE features consistently activated by target-language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/LangFIR-C0F5
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.