An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR
Sinan Urgun, Se\c{c}kin Ar{\i}

TL;DR
This paper systematically evaluates how different architectures and models influence the performance of large language models in abstract visual reasoning tasks using the RAVEN-FAIR dataset, highlighting model-specific sensitivities and the importance of multi-run assessments.
Contribution
It provides a comprehensive benchmark of multiple LLMs and reasoning architectures on visual reasoning, emphasizing the impact of architectural choices and the need for multi-run evaluation strategies.
Findings
GPT-4.1-Mini achieved highest accuracy across architectures
Multi-agent architecture affected semantic and numeric balance variably
Response coverage varies, complicating cross-architecture comparison
Abstract
This study aims to systematically evaluate the performance of large language models (LLMs) in abstract visual reasoning problems. We examined four LLM models (GPT-4.1-Mini, Claude-3.5-Haiku, Gemini-1.5-Flash, Llama-3.3-70b) utilizing four different reasoning architectures (single-shot, embedding-controlled repetition, self-reflection, and multi-agent) on the RAVEN-FAIR dataset. Visual responses generated through a three-stage process (JSON extraction, LLM reasoning, and Tool Function) were evaluated using SSIM and LPIPS metrics; Chain-of-Thought scores and error types (semantic hallucination, numeric misperception) were analyzed. Results demonstrate that GPT-4.1-Mini consistently achieved the highest overall accuracy across all architectures, indicating a strong reasoning capability. While the multi-agent architecture occasionally altered semantic and numeric balance across models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Explainable Artificial Intelligence (XAI)
