Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Sheng Liu, Tianlang Chen, Pan Lu, Haotian Ye, Yizheng Chen, Lei Xing, James Zou

TL;DR
Fractional Reasoning introduces a model-agnostic, training-free method to dynamically control reasoning depth at inference time, enhancing accuracy and efficiency in large language models across various tasks.
Contribution
It proposes a novel latent steering vector approach that allows continuous adjustment of reasoning intensity during inference, surpassing fixed prompt limitations.
Findings
Consistently improves performance on GSM8K, MATH500, and GPQA datasets.
Enhances both breadth-based and depth-based reasoning strategies.
Demonstrates effectiveness across diverse models and reasoning tasks.
Abstract
Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of…
Peer Reviews
Decision·Submitted to ICLR 2026
1、The proposed method does not rely on model fine-tuning or modifying input texts, allowing it to be directly applied to existing LLMs. It supports a variety of LLMs, including both general instruction-tuned models and reasoning-specialized models, demonstrating strong generalizability. 2、The authors provide complete code and supplementary materials, ensuring high transparency and strong reproducibility, which enhances the credibility of the proposed framework. 3、The work is intuitively easy to
1、Lack of explanation for the design of different latent steering approaches: The paper employs two different latent steering vectors for the reasoning (Chain-of-Thought) and reflection processes, but it does not clearly explain why these distinct designs are used. The absence of such an explanation weakens the interpretability of the framework and makes it difficult for readers to understand the specific impact of these design choices on model behavior.
1. The paper is well-written, and easy to follow . 2. It presents a clear motivation and supports its claims with relatively thorough experimental validation.
1. The core technique is a direct application of existing work (e.g., Representation Engineering, Activation Addition). The paper applies a known method to a new axis ("reasoning vs. direct answer") and lacks a fundamental methodological contribution. It is more of an application case study than novel research. Moreover, several existing papers (e.g. [1]) present ideas that are highly similar to those in this work. 2. The paper wrongly assumes "reasoning" is a single, linear dimension controllab
The paper introduces a simple, principled, and interpretable mechanism for inference-time control of reasoning via latent steering vectors, avoiding input rewrites or fine-tuning. It clearly connects instructional prompts to directional latent shifts and operationalizes this with a normalized additive intervention that is easy to implement and stable. The framework is broadly applicable: it enhances both breadth-based strategies (improving the quality/diversity of candidate generations for Major
While the contrastive construction of the steering vector is justified and practical, the paper would benefit from deeper analysis of where and how to intervene (per-layer/per-position granularity, keys/values/queries, attention vs MLP streams). The current choice (last-token representations concatenated across layers) is one of many possible designs; an ablation across layers or modules could strengthen claims of generality and inform best practices. The Best-of-N results hinge on an external P
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
