A State-Transition Framework for Efficient LLM Reasoning
Liang Zhang, Yu Zhao, Longyue Wang, Tianqi Shi, Weihua Luo, Kaifu Zhang, Jinsong Su

TL;DR
This paper introduces a state-transition framework for LLM reasoning that reduces computational costs from quadratic to linear using linear attention, while maintaining or improving reasoning performance.
Contribution
The paper proposes a novel state-transition reasoning framework with linear attention, enabling efficient and effective reasoning in LLMs without sacrificing capacity.
Findings
Significant reduction in attention complexity from quadratic to linear.
Improved reasoning efficiency across multiple datasets and models.
Enhanced reasoning performance with the proposed framework.
Abstract
While Long Chain-of-Thought (CoT) reasoning significantly improves Large Language Models (LLMs) performance on complex reasoning tasks, the substantial computational and memory costs of generating long CoT sequences limit their efficiency and practicality. Existing studies usually enhance the reasoning efficiency of LLMs by compressing CoT sequences. However, this approach conflicts with test-time scaling, limiting the reasoning capacity of LLMs. In this paper, we propose an efficient reasoning framework that models the reasoning process of LLMs as a state-transition process. Specifically, we first apply a linear attention mechanism to estimate the LLM's reasoning state, which records the historical reasoning information from previous reasoning steps. Then, based on the query prompt and the reasoning state, the LLM can efficiently perform the current reasoning step and update the state.…
Peer Reviews
Decision·ICLR 2026 Poster
1. The idea of employing a hybrid attention mechanism to achieve efficient reasoning is innovative. 2. Calibrating the current state based on the global state is convincing, and the experimental results demonstrate strong performance.
1. The method relies on step-level segmentation, which may limit its applicability to more general tasks. 2. The paper lacks certain implementation details, such as the diversity of thinking patterns and the specific configurations used in LoRA training.
1. In terms of research motivation, the focus of this paper is highly significant. While CoT enhances LLM performance on complex reasoning tasks, it also incurs substantial computational and memory costs. Current academic approaches to this efficiency problem often employ prompting, supervised fine-tuning (SFT), or reinforcement learning (RL) to compress CoT, which can lead to the loss of critical information. This paper innovatively addresses this issue, aiming to improve LLM reasoning efficien
1. The framework relies on segmenting long CoT sequences. The paper does not elaborate on the extent to which this segmentation method is applicable to different types of reasoning tasks or whether it can generalize effectively. Furthermore, it states that all reasoning steps in the training set are clustered, but the specific method used for this clustering is not described. It is also unclear to what extent the different thinking patterns effectively correspond to distinct reasoning types. Con
1. The problem is well-targeted and the motivation is clear. The paper tackles the latency and memory blow-up of long CoT reasoning: by restricting the SA branch to “prompt + current step” and introducing an LA branch to maintain a “historical reasoning state matrix,” it reduces attention complexity from quadratic to linear and the KV cache from linear to near-constant. The exposition is clear and technically coherent. 2. The method is novel and modular in practice. The proposed Mixed Attention
The methodological description is not sufficiently clear. I recommend adding a schematic of the attention matrices to reduce the reader’s cognitive load. In addition, I recommend including pseudocode or a diagram in the main text for both the training and inference procedures of the MAM method.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Big Data and Digital Economy · Natural Language Processing Techniques
