Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, Ritambhara Singh

TL;DR
Multimodal Large Language Models (MLLMs) exhibit significant shape recognition shortcomings, especially in geometric reasoning, but can be substantially improved with visual prompting techniques like VC-CoT, highlighting the need for better visual reasoning integration.
Contribution
This paper systematically evaluates MLLMs' geometric reasoning abilities, identifies their reliance on intuitive associations, and introduces VC-CoT prompting to significantly enhance visual-math reasoning performance.
Findings
MLLMs achieve under 50% accuracy in recognizing regular polygons.
Reliance on System 1 leads to failure in counting sides of shapes.
VC-CoT prompting boosts accuracy from 7% to 93% in side-counting tasks.
Abstract
Despite strong performance on vision-language tasks, Multimodal Large Language Models (MLLMs) struggle with mathematical problem-solving, with both open-source and state-of-the-art models falling short of human performance on visual-math benchmarks. To systematically examine visual-mathematical reasoning in MLLMs, we (1) evaluate their understanding of geometric primitives, (2) test multi-step reasoning, and (3) explore a potential solution to improve visual reasoning capabilities. Our findings reveal fundamental shortcomings in shape recognition, with top models achieving under 50% accuracy in identifying regular polygons. We analyze these failures through the lens of dual-process theory and show that MLLMs rely on System 1 (intuitive, memorized associations) rather than System 2 (deliberate reasoning). Consequently, MLLMs fail to count the sides of both familiar and novel shapes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
