Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

William Rudman; Michal Golovanevsky; Amir Bar; Vedant Palit; Yann LeCun; Carsten Eickhoff; Ritambhara Singh

arXiv:2502.15969·cs.CV·August 26, 2025·2 cites

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, Ritambhara Singh

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

Multimodal Large Language Models (MLLMs) exhibit significant shape recognition shortcomings, especially in geometric reasoning, but can be substantially improved with visual prompting techniques like VC-CoT, highlighting the need for better visual reasoning integration.

Contribution

This paper systematically evaluates MLLMs' geometric reasoning abilities, identifies their reliance on intuitive associations, and introduces VC-CoT prompting to significantly enhance visual-math reasoning performance.

Findings

01

MLLMs achieve under 50% accuracy in recognizing regular polygons.

02

Reliance on System 1 leads to failure in counting sides of shapes.

03

VC-CoT prompting boosts accuracy from 7% to 93% in side-counting tasks.

Abstract

Despite strong performance on vision-language tasks, Multimodal Large Language Models (MLLMs) struggle with mathematical problem-solving, with both open-source and state-of-the-art models falling short of human performance on visual-math benchmarks. To systematically examine visual-mathematical reasoning in MLLMs, we (1) evaluate their understanding of geometric primitives, (2) test multi-step reasoning, and (3) explore a potential solution to improve visual reasoning capabilities. Our findings reveal fundamental shortcomings in shape recognition, with top models achieving under 50% accuracy in identifying regular polygons. We analyze these failures through the lens of dual-process theory and show that MLLMs rely on System 1 (intuitive, memorized associations) rather than System 2 (deliberate reasoning). Consequently, MLLMs fail to count the sides of both familiar and novel shapes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rsinghlab/shape-blind
pytorchOfficial

Datasets

mgolov/shape-blind-dataset
dataset· 90 dl
90 dl

Videos

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling