SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

Yunlong Chu; Minglai Shao; Yuhang Liu; Bing Hao; Yumeng Lin; Jialu Wang; Ruijie Wang

arXiv:2603.06222·cs.CL·March 9, 2026

SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

Yunlong Chu, Minglai Shao, Yuhang Liu, Bing Hao, Yumeng Lin, Jialu Wang, Ruijie Wang

PDF

Open Access

TL;DR

SPOT introduces a span-level semantic alignment method that compresses chain-of-thought reasoning into interpretable latent tokens, improving reasoning accuracy and efficiency in large language models.

Contribution

It proposes a novel span-level semantic alignment technique with a frozen-head decoding constraint, enhancing interpretability and performance of latent reasoning in LLMs.

Findings

01

Achieves 2.3 point accuracy improvement on reasoning benchmarks.

02

Reduces token generation by 37.5%.

03

Provides faithful semantic interpretations of latent reasoning.

Abstract

Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely truncate what the model says rather than internalize what the model thinks. Latent reasoning offers a promising alternative by performing computation in the hidden space, yet prior methods face two critical challenges. Many existing approaches rely on rigid point-to-point alignment, forcing a latent token to approximate the final representation of a reasoning step, which can be insufficient to capture the dense, variable-length semantics of an entire reasoning segment. Furthermore, these methods often suffer from a lack of interpretability: latent states are commonly produced by unconstrained optimization or embedding mixing, yielding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications