Bridging Efficiency and Transparency: Explainable CoT Compression in Multimodal Large Reasoning Models
Yizhi Wang, Linan Yue, Min-Ling Zhang

TL;DR
This paper introduces XMCC, an explainable reinforcement learning-based method to compress long multimodal reasoning chains, reducing length while maintaining accuracy and providing interpretability.
Contribution
It presents a novel approach that combines compression with explainability in multimodal reasoning models, addressing key limitations of existing methods.
Findings
Significantly reduces reasoning chain length
Preserves reasoning accuracy and correctness
Provides natural language explanations for compression decisions
Abstract
Long chains of thought (Long CoTs) are widely employed in multimodal reasoning models to tackle complex tasks by capturing detailed visual information. However, these Long CoTs are often excessively lengthy and contain redundant reasoning steps, which can hinder inference efficiency. Compressing these long CoTs is a natural solution, yet existing approaches face two major challenges: (1) they may compromise the integrity of visual-textual reasoning by removing essential alignment cues, and (2) the compression process lacks explainability, making it difficult to discern which information is critical. To address these problems, we propose XMCC, an eXplainable Multimodal CoT Compressor that formulates compression as a sequential decision-making process optimized via reinforcement learning. XMCC can effectively shorten reasoning trajectories while preserving key reasoning steps and answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
