MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou, Jie He, Lanyu Chen, Jingyu Li, Haojing Chen, V\'ictor, Guti\'errez-Basulto, Jeff Z. Pan, Hanjie Chen

TL;DR
MiCEval introduces an automated framework for evaluating the quality of reasoning steps in multimodal chain-of-thought prompts, focusing on description accuracy and reasoning correctness, validated against human judgments.
Contribution
It provides the first fine-grained, automated evaluation method for MCoT reasoning steps, addressing a key gap in multimodal large language model assessment.
Findings
MiCEval's step-wise evaluation aligns better with human judgments than existing methods.
The framework is validated on four state-of-the-art MLLMs.
MiCEval offers a dataset and code for broader research use.
Abstract
Multimodal Chain of Thought (MCoT) is a popular prompting strategy for improving the performance of multimodal large language models (MLLMs) across a range of complex reasoning tasks. Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose Multimodal Chain-of-Thought Evaluation (MiCEval), a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step. The evaluation of the description component focuses on the accuracy of the image descriptions, while the reasoning step evaluates the quality of each step as it is conditionally generated based on the preceding steps. MiCEval is built upon a fine-grained dataset with annotations that rate each step according to correctness, relevance, and informativeness.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCognitive Science and Mapping · Intelligent Tutoring Systems and Adaptive Learning · Advanced Text Analysis Techniques
MethodsALIGN
