Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning
Mohit Vaishnav, Tanel Tammet

TL;DR
This paper demonstrates that symbolic representations significantly improve abstract visual reasoning in models, highlighting representation as a key bottleneck and using symbolic inputs as a diagnostic tool.
Contribution
It introduces the C--G paradigm that reformulates visual reasoning as a symbolic task, showing large gains with LLMs over visual models on Bongard-LOGO.
Findings
LLMs reach mid-90s accuracy with symbolic inputs
Visual models perform near chance under matched task definitions
Representation is identified as a key bottleneck in abstract reasoning
Abstract
Vision--language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using symbolic inputs as a diagnostic probe rather than a practical multimodal architecture, our \emph{Componential--Grammatical (C--G)} paradigm reformulates Bongard-LOGO as a symbolic reasoning task based on LOGO-style action programs or structured descriptions. LLMs achieve large and consistent gains, reaching mid--90s accuracy on Free-form problems, while a strong visual baseline remains near chance under matched task definitions. Ablations on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
