An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR

Sinan Urgun; Se\c{c}kin Ar{\i}

arXiv:2511.11916·cs.AI·November 18, 2025

An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR

Sinan Urgun, Se\c{c}kin Ar{\i}

PDF

Open Access

TL;DR

This paper systematically evaluates how different architectures and models influence the performance of large language models in abstract visual reasoning tasks using the RAVEN-FAIR dataset, highlighting model-specific sensitivities and the importance of multi-run assessments.

Contribution

It provides a comprehensive benchmark of multiple LLMs and reasoning architectures on visual reasoning, emphasizing the impact of architectural choices and the need for multi-run evaluation strategies.

Findings

01

GPT-4.1-Mini achieved highest accuracy across architectures

02

Multi-agent architecture affected semantic and numeric balance variably

03

Response coverage varies, complicating cross-architecture comparison

Abstract

This study aims to systematically evaluate the performance of large language models (LLMs) in abstract visual reasoning problems. We examined four LLM models (GPT-4.1-Mini, Claude-3.5-Haiku, Gemini-1.5-Flash, Llama-3.3-70b) utilizing four different reasoning architectures (single-shot, embedding-controlled repetition, self-reflection, and multi-agent) on the RAVEN-FAIR dataset. Visual responses generated through a three-stage process (JSON extraction, LLM reasoning, and Tool Function) were evaluated using SSIM and LPIPS metrics; Chain-of-Thought scores and error types (semantic hallucination, numeric misperception) were analyzed. Results demonstrate that GPT-4.1-Mini consistently achieved the highest overall accuracy across all architectures, indicating a strong reasoning capability. While the multi-agent architecture occasionally altered semantic and numeric balance across models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Explainable Artificial Intelligence (XAI)