EasyARC: Evaluating Vision Language Models on True Visual Reasoning

Mert Unsal; Aylin Akkus

arXiv:2506.11595·cs.CV·June 16, 2025

EasyARC: Evaluating Vision Language Models on True Visual Reasoning

Mert Unsal, Aylin Akkus

PDF

Open Access

TL;DR

EasyARC is a new vision-language benchmark designed to evaluate true visual reasoning through multi-image, multi-step tasks, addressing limitations of existing benchmarks and enabling better assessment of model reasoning capabilities.

Contribution

We introduce EasyARC, a scalable, procedurally generated benchmark for true visual reasoning, incorporating multiple images and steps, and provide a comprehensive evaluation of current models.

Findings

01

State-of-the-art models struggle with true visual reasoning tasks.

02

EasyARC enables structured evaluation across different difficulty levels.

03

Benchmark dataset and code are open-sourced for community use.

Abstract

Building on recent advances in language-based reasoning models, we explore multimodal reasoning that integrates vision and text. Existing multimodal benchmarks primarily test visual extraction combined with text-based reasoning, lacking true visual reasoning with more complex interactions between vision and language. Inspired by the ARC challenge, we introduce EasyARC, a vision-language benchmark requiring multi-image, multi-step reasoning, and self-correction. EasyARC is procedurally generated, fully verifiable, and scalable, making it ideal for reinforcement learning (RL) pipelines. The generators incorporate progressive difficulty levels, enabling structured evaluation across task types and complexities. We benchmark state-of-the-art vision-language models and analyze their failure modes. We argue that EasyARC sets a new standard for evaluating true reasoning and test-time scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Semantic Web and Ontologies