Step-Level Sparse Autoencoder for Reasoning Process Interpretation

Xuan Yang; Jiayu Liu; Yuhang Lai; Hao Xu; Zhenya Huang; Ning Miao

arXiv:2603.03031·cs.LG·March 4, 2026

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces a step-level sparse autoencoder (SSAE) to interpret LLM reasoning processes by disentangling step features, enabling analysis of reasoning direction, semantic transitions, and properties like correctness and logicality.

Contribution

The proposed SSAE captures step-level reasoning features with controlled sparsity, improving interpretability of LLMs' reasoning steps beyond token-level analysis.

Findings

01

Extracted features predict reasoning correctness and logicality

02

LMMs partly encode properties like generation length during reasoning

03

SSAE enhances understanding of LLMs' reasoning process

Abstract

Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpretability, existing approaches predominantly operate at the token level, creating a granularity mismatch when capturing more critical step-level information, such as reasoning direction and semantic transitions. In this work, we propose step-level sparse autoencoder (SSAE), which serves as an analytical tool to disentangle different aspects of LLMs' reasoning steps into sparse features. Specifically, by precisely controlling the sparsity of a step feature conditioned on its context, we form an information bottleneck in step reconstruction, which splits incremental information from background information and disentangles it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Miaow-Lab/SSAE-Checkpoints
model

Datasets

Miaow-Lab/SSAE-Dataset
dataset· 58 dl
58 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications