A State-Transition Framework for Efficient LLM Reasoning

Liang Zhang; Yu Zhao; Longyue Wang; Tianqi Shi; Weihua Luo; Kaifu Zhang; Jinsong Su

arXiv:2602.01198·cs.AI·February 3, 2026

A State-Transition Framework for Efficient LLM Reasoning

Liang Zhang, Yu Zhao, Longyue Wang, Tianqi Shi, Weihua Luo, Kaifu Zhang, Jinsong Su

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a state-transition framework for LLM reasoning that reduces computational costs from quadratic to linear using linear attention, while maintaining or improving reasoning performance.

Contribution

The paper proposes a novel state-transition reasoning framework with linear attention, enabling efficient and effective reasoning in LLMs without sacrificing capacity.

Findings

01

Significant reduction in attention complexity from quadratic to linear.

02

Improved reasoning efficiency across multiple datasets and models.

03

Enhanced reasoning performance with the proposed framework.

Abstract

While Long Chain-of-Thought (CoT) reasoning significantly improves Large Language Models (LLMs) performance on complex reasoning tasks, the substantial computational and memory costs of generating long CoT sequences limit their efficiency and practicality. Existing studies usually enhance the reasoning efficiency of LLMs by compressing CoT sequences. However, this approach conflicts with test-time scaling, limiting the reasoning capacity of LLMs. In this paper, we propose an efficient reasoning framework that models the reasoning process of LLMs as a state-transition process. Specifically, we first apply a linear attention mechanism to estimate the LLM's reasoning state, which records the historical reasoning information from previous reasoning steps. Then, based on the query prompt and the reasoning state, the LLM can efficiently perform the current reasoning step and update the state.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The idea of employing a hybrid attention mechanism to achieve efficient reasoning is innovative. 2. Calibrating the current state based on the global state is convincing, and the experimental results demonstrate strong performance.

Weaknesses

1. The method relies on step-level segmentation, which may limit its applicability to more general tasks. 2. The paper lacks certain implementation details, such as the diversity of thinking patterns and the specific configurations used in LoRA training.

Reviewer 02Rating 6Confidence 3

Strengths

1. In terms of research motivation, the focus of this paper is highly significant. While CoT enhances LLM performance on complex reasoning tasks, it also incurs substantial computational and memory costs. Current academic approaches to this efficiency problem often employ prompting, supervised fine-tuning (SFT), or reinforcement learning (RL) to compress CoT, which can lead to the loss of critical information. This paper innovatively addresses this issue, aiming to improve LLM reasoning efficien

Weaknesses

1. The framework relies on segmenting long CoT sequences. The paper does not elaborate on the extent to which this segmentation method is applicable to different types of reasoning tasks or whether it can generalize effectively. Furthermore, it states that all reasoning steps in the training set are clustered, but the specific method used for this clustering is not described. It is also unclear to what extent the different thinking patterns effectively correspond to distinct reasoning types. Con

Reviewer 03Rating 6Confidence 3

Strengths

1. The problem is well-targeted and the motivation is clear. The paper tackles the latency and memory blow-up of long CoT reasoning: by restricting the SA branch to “prompt + current step” and introducing an LA branch to maintain a “historical reasoning state matrix,” it reduces attention complexity from quadratic to linear and the KV cache from linear to near-constant. The exposition is clear and technically coherent. 2. The method is novel and modular in practice. The proposed Mixed Attention

Weaknesses

The methodological description is not sufficiently clear. I recommend adding a schematic of the attention matrices to reduce the reader’s cognitive load. In addition, I recommend including pseudocode or a diagram in the main text for both the training and inference procedures of the MAM method.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Big Data and Digital Economy · Natural Language Processing Techniques