MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image   Description and Reasoning Steps

Xiongtao Zhou; Jie He; Lanyu Chen; Jingyu Li; Haojing Chen; V\'ictor; Guti\'errez-Basulto; Jeff Z. Pan; Hanjie Chen

arXiv:2410.14668·cs.CL·March 3, 2025

MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps

Xiongtao Zhou, Jie He, Lanyu Chen, Jingyu Li, Haojing Chen, V\'ictor, Guti\'errez-Basulto, Jeff Z. Pan, Hanjie Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

MiCEval introduces an automated framework for evaluating the quality of reasoning steps in multimodal chain-of-thought prompts, focusing on description accuracy and reasoning correctness, validated against human judgments.

Contribution

It provides the first fine-grained, automated evaluation method for MCoT reasoning steps, addressing a key gap in multimodal large language model assessment.

Findings

01

MiCEval's step-wise evaluation aligns better with human judgments than existing methods.

02

The framework is validated on four state-of-the-art MLLMs.

03

MiCEval offers a dataset and code for broader research use.

Abstract

Multimodal Chain of Thought (MCoT) is a popular prompting strategy for improving the performance of multimodal large language models (MLLMs) across a range of complex reasoning tasks. Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose Multimodal Chain-of-Thought Evaluation (MiCEval), a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step. The evaluation of the description component focuses on the accuracy of the image descriptions, while the reasoning step evaluates the quality of each step as it is conditionally generated based on the preceding steps. MiCEval is built upon a fine-grained dataset with annotations that rate each step according to correctness, relevance, and informativeness.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alenai97/miceval
pytorchOfficial

Videos

MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps· underline

Taxonomy

TopicsCognitive Science and Mapping · Intelligent Tutoring Systems and Adaptive Learning · Advanced Text Analysis Techniques

MethodsALIGN