Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav; Tanel Tammet

arXiv:2604.21346·cs.AI·April 24, 2026

Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, Tanel Tammet

PDF

TL;DR

This paper demonstrates that symbolic representations significantly improve abstract visual reasoning in models, highlighting representation as a key bottleneck and using symbolic inputs as a diagnostic tool.

Contribution

It introduces the C--G paradigm that reformulates visual reasoning as a symbolic task, showing large gains with LLMs over visual models on Bongard-LOGO.

Findings

01

LLMs reach mid-90s accuracy with symbolic inputs

02

Visual models perform near chance under matched task definitions

03

Representation is identified as a key bottleneck in abstract reasoning

Abstract

Vision--language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using symbolic inputs as a diagnostic probe rather than a practical multimodal architecture, our \emph{Componential--Grammatical (C--G)} paradigm reformulates Bongard-LOGO as a symbolic reasoning task based on LOGO-style action programs or structured descriptions. LLMs achieve large and consistent gains, reaching mid--90s accuracy on Free-form problems, while a strong visual baseline remains near chance under matched task definitions. Ablations on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.