IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests

Tan-Hanh Pham; Phu-Vinh Nguyen; Dang The Hung; Bui Trong Duong; Vu Nguyen Thanh; Chris Ngo; Tri Quang Truong; Truong-Son Hy

arXiv:2505.12000·cs.CV·May 20, 2025

IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests

Tan-Hanh Pham, Phu-Vinh Nguyen, Dang The Hung, Bui Trong Duong, Vu Nguyen Thanh, Chris Ngo, Tri Quang Truong, Truong-Son Hy

PDF

Open Access

TL;DR

IQBench is a new benchmark designed to evaluate the reasoning capabilities of vision-language models on visual IQ tests, emphasizing explanation quality and reasoning patterns over mere accuracy.

Contribution

The paper introduces IQBench, a visually centered benchmark with annotated questions to assess reasoning in VLMs, highlighting their limitations and disparities across tasks.

Findings

01

Models perform variably across tasks, with top scores around 0.615 accuracy.

02

All models struggle with 3D spatial and anagram reasoning.

03

Significant gaps exist between models' reasoning processes and their final answers.

Abstract

Although large Vision-Language Models (VLMs) have demonstrated remarkable performance in a wide range of multimodal tasks, their true reasoning capabilities on human IQ tests remain underexplored. To advance research on the fluid intelligence of VLMs, we introduce **IQBench**, a new benchmark designed to evaluate VLMs on standardized visual IQ tests. We focus on evaluating the reasoning capabilities of VLMs, which we argue are more important than the accuracy of the final prediction. **Our benchmark is visually centric, minimizing the dependence on unnecessary textual content**, thus encouraging models to derive answers primarily from image-based information rather than learned textual knowledge. To this end, we manually collected and annotated 500 visual IQ questions to **prevent unintentional data leakage during training**. Unlike prior work that focuses primarily on the accuracy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Neurobiology of Language and Bilingualism

MethodsFocus