Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin, Ante Wang, Moye Chen, Jingyao Liu, Hao Liu, Jinsong Su, Xinyan Xiao

TL;DR
This paper explores the effects of inference-time scaling on multi-modal reasoning, combining visual and textual inputs, and evaluates different methods on diverse tasks to understand benefits and challenges.
Contribution
It pioneers the study of inference-time scaling for multi-modal thought, analyzing its impact and challenges across various reasoning tasks.
Findings
Multi-modal thought improves reasoning performance over text-only approaches.
Blending visual and textual reasoning fosters more diverse thinking.
Multi-modal thoughts require higher token consumption, raising practical concerns.
Abstract
Recently, inference-time scaling of chain-of-thought (CoT) has been demonstrated as a promising approach for addressing multi-modal reasoning tasks. While existing studies have predominantly centered on text-based thinking, the integration of both visual and textual modalities within the reasoning process remains unexplored. In this study, we pioneer the exploration of inference-time scaling with multi-modal thought, aiming to bridge this gap. To provide a comprehensive analysis, we systematically investigate popular sampling-based and tree search-based inference-time scaling methods on 10 challenging tasks spanning various domains. Besides, we uniformly adopt a consistency-enhanced verifier to ensure effective guidance for both methods across different thought paradigms. Results show that multi-modal thought promotes better performance against conventional text-only thought, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Text Analysis Techniques
MethodsADaptive gradient method with the OPTimal convergence rate
