Steering LLM Reasoning Through Bias-Only Adaptation

Viacheslav Sinii; Alexey Gorbatovski; Artem Cherepanov; Boris Shaposhnikov; Nikita Balagansky; Daniil Gavrilov

arXiv:2505.18706·cs.LG·October 2, 2025

Steering LLM Reasoning Through Bias-Only Adaptation

Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov

PDF

Open Access 1 Video

TL;DR

This paper introduces a minimalistic method for improving reasoning in large language models by training a single steering vector per layer with reinforcement learning, achieving comparable performance to full fine-tuning with significantly fewer parameters.

Contribution

It demonstrates that high-quality reasoning can be achieved with a tiny, parameter-efficient adaptation, reducing costs and providing interpretability insights.

Findings

01

Matching fully RL-tuned models with only 0.0016% additional parameters.

02

Reduces fine-tuning costs and memory usage.

03

Provides clearer insights into model internal computations.

Abstract

We show that training a single $d$ -dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only $\approx 0.0016%$ additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Steering LLM Reasoning Through Bias-Only Adaptation· underline

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation

MethodsBalanced Selection