BYO-Eval: Build Your Own Dataset for Fine-Grained Visual Assessment of Multimodal Language Models

Ludovic Arnould; Salim Khazem; Hugues Ali Mehenni

arXiv:2506.05440·cs.CV·June 9, 2025

BYO-Eval: Build Your Own Dataset for Fine-Grained Visual Assessment of Multimodal Language Models

Ludovic Arnould, Salim Khazem, Hugues Ali Mehenni

PDF

Open Access

TL;DR

BYO-Eval introduces a synthetic, procedural image generation approach for detailed, controllable evaluation of multimodal language models, enabling precise diagnosis of visual perception and reasoning failures.

Contribution

It presents a novel diagnostic methodology inspired by ophthalmology, allowing fine-grained assessment of VLMs through controllable synthetic images, reducing annotation costs and improving failure analysis.

Findings

01

Enables systematic stress testing of VLMs

02

Reveals specific perception and reasoning failures

03

Provides a scalable, interpretable evaluation framework

Abstract

Visual Language Models (VLMs) are now sufficiently advanced to support a broad range of applications, including answering complex visual questions, and are increasingly expected to interact with images in varied ways. To evaluate them, current benchmarks often focus on specific domains (e.g., reading charts), constructing datasets of annotated real images paired with pre-defined Multiple Choice Questions (MCQs) to report aggregate accuracy scores. However, such benchmarks entail high annotation costs, risk information leakage, and do not clarify whether failures stem from limitations in visual perception, reasoning, or general knowledge. We propose a new evaluation methodology, inspired by ophthalmologic diagnostics, leveraging procedural generation of synthetic images to obtain control over visual attributes and precisely reveal perception failures in VLMs. Specifically, we build…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)