Efficient Reasoning with Hidden Thinking

Xuan Shen; Yizhou Wang; Yufa Zhou; Xiangxi Shi; Pu Zhao; Yanzhi Wang; Jiuxiang Gu

arXiv:2501.19201·cs.CL·May 5, 2026

Efficient Reasoning with Hidden Thinking

Xuan Shen, Yizhou Wang, Yufa Zhou, Xiangxi Shi, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

PDF

2 Repos

TL;DR

Heima compresses chain-of-thought reasoning in multimodal large language models into abstract tokens, maintaining reasoning quality while significantly improving efficiency, supported by theoretical analysis and empirical validation.

Contribution

This work introduces Heima, a novel CoT compression framework that preserves reasoning capabilities with minimal information loss, validated through theoretical and experimental analysis.

Findings

01

Heima reduces reasoning token length while maintaining accuracy.

02

Theoretical analysis shows reasoning capability is preserved with mutual information.

03

Experiments demonstrate improved efficiency and zero-shot performance.

Abstract

Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual reasoning introduces significant inefficiencies. In this work, we propose Heima (as hidden llama), an effective CoT compression framework that condenses lengthy CoTs into a small set of abstract thinking tokens, preserving essential reasoning while removing redundancy. We then conduct a theoretical analysis from an information-theoretic perspective, quantifying the information gap induced by compression, showing that reasoning capability is preserved when non-trivial mutual information is retained. To further explore and quantify this information gap, we design the adaptive interpreter that maps thinking tokens back to variable-length textual sequences, thereby reconstructing the reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.