TL;DR
Heima compresses chain-of-thought reasoning in multimodal large language models into abstract tokens, maintaining reasoning quality while significantly improving efficiency, supported by theoretical analysis and empirical validation.
Contribution
This work introduces Heima, a novel CoT compression framework that preserves reasoning capabilities with minimal information loss, validated through theoretical and experimental analysis.
Findings
Heima reduces reasoning token length while maintaining accuracy.
Theoretical analysis shows reasoning capability is preserved with mutual information.
Experiments demonstrate improved efficiency and zero-shot performance.
Abstract
Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual reasoning introduces significant inefficiencies. In this work, we propose Heima (as hidden llama), an effective CoT compression framework that condenses lengthy CoTs into a small set of abstract thinking tokens, preserving essential reasoning while removing redundancy. We then conduct a theoretical analysis from an information-theoretic perspective, quantifying the information gap induced by compression, showing that reasoning capability is preserved when non-trivial mutual information is retained. To further explore and quantify this information gap, we design the adaptive interpreter that maps thinking tokens back to variable-length textual sequences, thereby reconstructing the reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
