Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task

Yanbei Jiang; Yihao Ding; Chao Lei; Jiayang Ao; Jey Han Lau; Krista A. Ehinger

arXiv:2505.21850·cs.CV·June 2, 2025

Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task

Yanbei Jiang, Yihao Ding, Chao Lei, Jiayang Ao, Jey Han Lau, Krista A. Ehinger

PDF

Open Access 1 Repo

TL;DR

This paper introduces MultiStAR, a multi-stage benchmark for abstract visual reasoning, and MSEval, a metric that evaluates both intermediate reasoning steps and final outcomes in multimodal large language models.

Contribution

The paper presents a new multi-stage AVR benchmark and a novel metric to better evaluate reasoning processes in MLLMs, addressing limitations of existing single-step benchmarks and metrics.

Findings

01

MLLMs perform well on perception tasks

02

MLLMs struggle with complex rule detection

03

Intermediate reasoning remains challenging

Abstract

Current Multimodal Large Language Models (MLLMs) excel in general visual reasoning but remain underexplored in Abstract Visual Reasoning (AVR), which demands higher-order reasoning to identify abstract rules beyond simple perception. Existing AVR benchmarks focus on single-step reasoning, emphasizing the end result but neglecting the multi-stage nature of reasoning process. Past studies found MLLMs struggle with these benchmarks, but it doesn't explain how they fail. To address this gap, we introduce MultiStAR, a Multi-Stage AVR benchmark, based on RAVEN, designed to assess reasoning across varying levels of complexity. Additionally, existing metrics like accuracy only focus on the final outcomes while do not account for the correctness of intermediate steps. Therefore, we propose a novel metric, MSEval, which considers the correctness of intermediate steps in addition to the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanbeijiang/multistar
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning · Visual and Cognitive Learning Processes

MethodsFocus