ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking
Lihong Wang, Liangqi Li, Weiwei Feng, Jiamin Wu, Changtao Miao, Tieru Wu, Rui Ma, Bo Zhang, Zhe Li

TL;DR
This paper introduces ViRC, a framework that structures multimodal mathematical reasoning into logical units, improving the reasoning capabilities of large language models by mimicking human problem-solving strategies.
Contribution
We propose a Reason Chunking mechanism and a new dataset, CRUX, to enhance multimodal mathematical reasoning in LLMs, inspired by cognitive science principles.
Findings
ViRC-7B outperforms baselines by 18.8% on mathematical benchmarks.
The Reason Chunking approach improves intra-unit coherence and visual reasoning integration.
The CRUX dataset provides annotated reasoning units to facilitate training.
Abstract
CoT has significantly enhanced the reasoning ability of LLMs while it faces challenges when extended to multimodal domains, particularly in mathematical tasks. Existing MLLMs typically perform textual reasoning solely from a single static mathematical image, overlooking dynamic visual acquisition during reasoning. In contrast, humans repeatedly examine visual image and employ step-by-step reasoning to prove intermediate propositions. This strategy of decomposing the problem-solving process into key logical nodes adheres to Miller's Law in cognitive science. Inspired by this insight, we propose a ViRC framework for multimodal mathematical tasks, introducing a Reason Chunking mechanism that structures multimodal mathematical CoT into consecutive Critical Reasoning Units (CRUs) to simulate human expert problem-solving patterns. CRUs ensure intra-unit textual coherence for intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Mathematics Education and Teaching Techniques
