ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta,, Samuel Roberts, Ioana Croitoru, Simion-Vlad Bogolin, Jialu Tang, Florian, Langer, Vyas Raina, Vatsal Raina, Hanyi Xiong, Vishaal Udandarao, Jingyi Lu,, Shiyang Chen, Sam Purkis, Tianshuo Yan, Wenye Lin

TL;DR
ZeroBench is a challenging visual reasoning benchmark designed to be impossible for current large multimodal models, highlighting their limitations and encouraging future improvements in visual understanding.
Contribution
We introduce ZeroBench, a novel lightweight benchmark with impossible questions for current LMMs, and provide a comprehensive evaluation and analysis of their failures.
Findings
All evaluated LMMs scored 0.0% on ZeroBench.
ZeroBench remains impossible despite ongoing model progress.
The benchmark is publicly available to foster future research.
Abstract
Large Multimodal Models (LMMs) exhibit major shortfalls when interpreting images and, by some measures, have poorer spatial cognition than small children or animals. Despite this, they attain high scores on many popular visual benchmarks, with headroom rapidly eroded by an ongoing surge of model progress. To address this, there is a pressing need for difficult benchmarks that remain relevant for longer. We take this idea to its limit by introducing ZeroBench-a lightweight visual reasoning benchmark that is entirely impossible for contemporary frontier LMMs. Our benchmark consists of 100 manually curated questions and 334 less difficult subquestions. We evaluate 20 LMMs on ZeroBench, all of which score 0.0%, and rigorously analyse the errors. To encourage progress in visual understanding, we publicly release ZeroBench.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
