Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia W\"ust; Tim Woydt; Lukas Helff; Inga Ibs; Wolfgang Stammer; Devendra S. Dhami; Constantin A. Rothkopf; Kristian Kersting

arXiv:2410.19546·cs.AI·July 15, 2025

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia W\"ust, Tim Woydt, Lukas Helff, Inga Ibs, Wolfgang Stammer, Devendra S. Dhami, Constantin A. Rothkopf, Kristian Kersting

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates Vision-Language Models on Bongard visual puzzles, revealing that despite some successes, they struggle with basic concepts and generalization, highlighting gaps in AI's abstract visual reasoning compared to humans.

Contribution

It introduces a comprehensive evaluation of VLMs on Bongard problems, exposing their limitations in understanding elementary concepts and generalizing reasoning skills.

Findings

01

VLMs occasionally identify discriminative concepts

02

Models struggle with elementary visual concepts like spirals

03

Significant gap exists between human and AI reasoning abilities

Abstract

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-research/bongard-in-wonderland
noneOfficial

Videos

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?· slideslive

Taxonomy

TopicsEthics and Social Impacts of AI · Reinforcement Learning in Robotics

MethodsSparse Evolutionary Training · Focus