MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Huanqia Cai; Yijun Yang; Winston Hu

arXiv:2502.00698·cs.AI·June 5, 2025

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Huanqia Cai, Yijun Yang, Winston Hu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces MM-IQ, a benchmark for evaluating human-like abstraction and reasoning in multimodal models, revealing current models' significant limitations and proposing a new baseline trained with reinforcement learning.

Contribution

The paper presents MM-IQ, a large-scale multimodal reasoning benchmark, and provides a baseline model trained with reinforcement learning to advance AI reasoning capabilities.

Findings

01

Current models perform only slightly better than chance on MM-IQ.

02

Existing models show significant gaps in human-like reasoning abilities.

03

A new reinforcement learning-based baseline achieves competitive performance with smaller size.

Abstract

IQ testing has served as a foundational methodology for evaluating human cognitive capabilities, deliberately decoupling assessment from linguistic background, language proficiency, or domain-specific knowledge to isolate core competencies in abstraction and reasoning. Yet, artificial intelligence research currently lacks systematic benchmarks to quantify these critical cognitive capabilities in multimodal systems. To address this crucial gap, we propose MM-IQ, a comprehensive evaluation framework, which comprises a large-scale training set with 4,776 visual reasoning problems and 2,710 meticulously curated test items spanning 8 distinct reasoning paradigms. Through systematic evaluation of existing open-source and proprietary multimodal models, our benchmark reveals striking limitations: even state-of-the-art architectures achieve only marginally superior performance to random chance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AceCHQ/MMIQ
pytorchOfficial

Datasets

huanqia/MM-IQ
dataset· 105 dl
105 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Language, Metaphor, and Cognition