Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia W\"ust, Tim Woydt, Lukas Helff, Inga Ibs, Wolfgang Stammer, Devendra S. Dhami, Constantin A. Rothkopf, Kristian Kersting

TL;DR
This paper evaluates Vision-Language Models on Bongard visual puzzles, revealing that despite some successes, they struggle with basic concepts and generalization, highlighting gaps in AI's abstract visual reasoning compared to humans.
Contribution
It introduces a comprehensive evaluation of VLMs on Bongard problems, exposing their limitations in understanding elementary concepts and generalizing reasoning skills.
Findings
VLMs occasionally identify discriminative concepts
Models struggle with elementary visual concepts like spirals
Significant gap exists between human and AI reasoning abilities
Abstract
Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Reinforcement Learning in Robotics
MethodsSparse Evolutionary Training · Focus
