Bongards at the Boundary of Perception and Reasoning: Programs or Language?
Cassidy Langenfeld, Claas Beger, Gloria Geng, Wasu Top Piriyakulkij, Keya Hu, Yewen Pu, Kevin Ellis

TL;DR
This paper introduces a neurosymbolic method combining language models and Bayesian optimization to solve Bongard problems, testing visual reasoning and rule inference capabilities beyond typical vision tasks.
Contribution
The paper presents a novel approach that uses LLMs to generate programmatic rules and Bayesian optimization for fitting, advancing the understanding of visual reasoning in AI.
Findings
Successfully classifies Bongard problem images using the proposed method
Able to solve Bongard problems from scratch with high accuracy
Demonstrates the potential of neurosymbolic approaches for reasoning tasks
Abstract
Vision-Language Models (VLMs) have made great strides in everyday visual tasks, such as captioning a natural image, or answering commonsense questions about such images. But humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations, a skill rigorously tested by the classic set of visual reasoning challenges known as the Bongard problems. We present a neurosymbolic approach to solving these problems: given a hypothesized solution rule for a Bongard problem, we leverage LLMs to generate parameterized programmatic representations for the rule and perform parameter fitting using Bayesian optimization. We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
This paper presents an interesting approach to solving Bongard problems using program synthesis by VLM, in combination with function optimization methods.
Unfortunately, this paper has several weaknesses regarding its contribution, the soundness of the proposed method, its clarity, and the discussion of limitations: 1. It is unclear whether the proposed method will be considered useful. The neurosymbolic method uses parameterized programs and, according to the abstract, is a way to overcome the limitations of current VLMs when solving Bongard problems. However, the evaluations do not support or validate this expectation. The paper is based on th
- Combining a VLM’s ability to propose hypotheses with a partly symbolic verifier is a compelling idea that seems like a good direction towards solving Bongard problems - The use of Bayesian optimization to tune the parameters of the generated programs is a technically sound and interesting contribution.
- The exposition is often confusing and lacks precision in several key parts. For example, lines 166–170 make it unclear when the verifier receives ground truth information and what exactly constitutes the solution task. Similarly, lines 226–229 do not clearly explain whether one or multiple programs are used, and how these are sampled. - The paper’s terminology for different task components is confusing, making it difficult to follow the workflow. For example, the method section begins by
- The topic is interesting and timely, addressing the gap in reasoning in current VLMs. - Using programmatic representations to reason about visual rules is a creative idea that connects symbolic reasoning with modern VLMs.
I provide a high-level list of my concerns, and detailed questions for clarification and suggestions are listed in the Question section: * **Methodology lacks clarity** — key processes like verification, rule generation, and program execution are not clearly described or connected. * **Experimental design is confusing** — sampling choices, baselines, and result interpretation are not well justified. * **Presentation is hard to follow** — the paper needs a clearer structure, figures, and explanat
I think the general research question is valid and interesting: are natural language statements or programs or both more valuable to solving such visual puzzles via VLMs.
While I find value in the research question, comparing natural language versus program representations for solving business processes (BPs), and appreciate its conceptual simplicity, the paper suffers from significant gaps in critical information. Specifically, the authors fail to clearly articulate both their contribution (which appears to be a prompting pipeline, if I understand correctly) and its mechanics. Equally problematic is the lack of detail regarding their experimental methodology. Th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Child and Animal Learning Development
