Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Cheng-Ting Chou; George Liu; Jessica Sun; Cole Blondin; Kevin Zhu; Vasu Sharma; Sean O'Brien

arXiv:2507.13410·cs.CL·October 17, 2025

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Cheng-Ting Chou, George Liu, Jessica Sun, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien

PDF

Open Access 1 Video

TL;DR

This paper introduces a sparse feature steering method using autoencoder features to control the language output of multilingual transformers during inference, achieving high accuracy without fine-tuning.

Contribution

It demonstrates that modifying a single SAE feature at specific transformer layers can effectively steer language generation in large multilingual models.

Findings

01

Achieved up to 90% success in language control

02

Most effective in mid-to-late transformer layers

03

Language steering is linked to specific attention heads

Abstract

Deterministically controlling the target generation language of large multilingual language models (LLMs) remains a fundamental challenge, particularly in zero-shot settings where neither explicit language prompts nor fine-tuning are available. In this work, we investigate whether sparse autoencoder (SAE) features, previously shown to correlate with interpretable model behaviors, can be leveraged to steer the generated language of LLMs during inference. Leveraging pretrained SAEs on the residual streams of Gemma-2B and Gemma-9B, we identify features whose activations differ most significantly between English and four target languages: Chinese, Japanese, Spanish, and French. By modifying just a single SAE feature at one transformer layer, we achieve controlled language shifts with up to 90\% success, as measured by FastText language classification, while preserving semantic fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Causal Language Control in Multilingual Transformers via Sparse Feature Steering· underline

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)