Reliable Control-Point Selection for Steering Reasoning in Large Language Models

Haomin Zhuang; Hojun Yoo; Xiaonan Luo; Kehan Guo; Xiangliang Zhang

arXiv:2604.02113·cs.CL·April 3, 2026

Reliable Control-Point Selection for Steering Reasoning in Large Language Models

Haomin Zhuang, Hojun Yoo, Xiaonan Luo, Kehan Guo, Xiangliang Zhang

PDF

1 Repo

TL;DR

This paper introduces a stability filtering method for selecting reliable control points in large language models, significantly improving reasoning behavior steering accuracy and transferability across models.

Contribution

It develops a probabilistic model to identify stable behavioral boundaries and proposes a filtering technique that enhances reasoning steering effectiveness.

Findings

01

Achieves 0.784 accuracy on MATH-500 with stability filtering.

02

Improves transferability of steering vectors across models within the same architecture.

03

Reduces behavioral instability from 93.3% to more stable boundaries.

Abstract

Steering vectors offer a training-free mechanism for controlling reasoning behaviors in large language models, but constructing effective vectors requires identifying genuine behavioral signals in the model's hidden states. For behaviors that can be toggled via prompts, this is straightforward. However, many reasoning behaviors -- such as self-reflection -- emerge spontaneously and resist prompt-level control. Current methods detect these behaviors through keyword matching in chain-of-thought traces, implicitly assuming that every detected boundary encodes a genuine behavioral signal. We show that this assumption is overwhelmingly wrong: across 541 keyword-detected boundaries, 93.3\% are behaviorally unstable, failing to reproduce the detected behavior under re-generation from the same prefix. We develop a probabilistic model that formalizes intrinsic reasoning behaviors as stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhmzm/stability-steering
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.