Steering Code LLMs with Activation Directions for Language and Library Control
Md Mahbubur Rahman, Arjun Guha, Harshitha Menon

TL;DR
This paper explores how code language and library preferences in Code LLMs can be manipulated at inference time by estimating and applying linear steering vectors in activation space, enabling targeted control over generated code ecosystems.
Contribution
The authors introduce a method to identify and apply linear activation directions to steer Code LLMs toward specific programming languages and libraries during inference.
Findings
Steering vectors effectively increase generation in target ecosystems.
Interventions remain effective even when prompts oppose the target preference.
Steering strength impacts output quality and ease of induction.
Abstract
Code LLMs often default to particular programming languages and libraries under neutral prompts. We investigate whether these preferences are encoded as approximately linear directions in activation space that can be manipulated at inference time. Using a difference-in-means method, we estimate layer-wise steering vectors for five language/library pairs and add them to model hidden states during generation. Across three open-weight code LLMs, these interventions substantially increase generation toward the target ecosystem under neutral prompts and often remain effective even when prompts explicitly request the opposite choice. Steering strength varies by model and target, with common ecosystems easier to induce than rarer alternatives, and overly strong interventions can reduce output quality. Overall, our results suggest that code-style preferences in LLMs are partly represented by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
