BYO-Eval: Build Your Own Dataset for Fine-Grained Visual Assessment of Multimodal Language Models
Ludovic Arnould, Salim Khazem, Hugues Ali Mehenni

TL;DR
BYO-Eval introduces a synthetic, procedural image generation approach for detailed, controllable evaluation of multimodal language models, enabling precise diagnosis of visual perception and reasoning failures.
Contribution
It presents a novel diagnostic methodology inspired by ophthalmology, allowing fine-grained assessment of VLMs through controllable synthetic images, reducing annotation costs and improving failure analysis.
Findings
Enables systematic stress testing of VLMs
Reveals specific perception and reasoning failures
Provides a scalable, interpretable evaluation framework
Abstract
Visual Language Models (VLMs) are now sufficiently advanced to support a broad range of applications, including answering complex visual questions, and are increasingly expected to interact with images in varied ways. To evaluate them, current benchmarks often focus on specific domains (e.g., reading charts), constructing datasets of annotated real images paired with pre-defined Multiple Choice Questions (MCQs) to report aggregate accuracy scores. However, such benchmarks entail high annotation costs, risk information leakage, and do not clarify whether failures stem from limitations in visual perception, reasoning, or general knowledge. We propose a new evaluation methodology, inspired by ophthalmologic diagnostics, leveraging procedural generation of synthetic images to obtain control over visual attributes and precisely reveal perception failures in VLMs. Specifically, we build…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
