InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng,, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang,, Hongxia Yang

TL;DR
This paper introduces InfiMM-Eval, a benchmark dataset designed to evaluate complex open-ended reasoning in multi-modal large language models across deductive, abductive, and analogical reasoning tasks, with an emphasis on intermediate reasoning steps.
Contribution
It presents a manually curated, multi-step reasoning benchmark for MLLMs that emphasizes complex reasoning and intermediate steps, improving upon existing simple evaluation methods.
Findings
MLLMs show varied performance on complex reasoning tasks.
Intermediate reasoning steps improve evaluation accuracy.
Benchmark effectively distinguishes reasoning capabilities of different MLLMs.
Abstract
Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. These models not only excel in traditional vision-language tasks but also demonstrate impressive performance in contemporary multi-modal benchmarks. Although many of these benchmarks attempt to holistically evaluate MLLMs, they typically concentrate on basic reasoning tasks, often yielding only simple yes/no or multi-choice responses. These methods naturally lead to confusion and difficulties in conclusively determining the reasoning capabilities of MLLMs. To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks. Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning. The queries in our dataset are intentionally constructed to engage the reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsALIGN · Focus
