BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs

Aaditya Baranwal; Vishal Yadav; Abhishek Rajora

arXiv:2604.10528·cs.CV·May 5, 2026

BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs

Aaditya Baranwal, Vishal Yadav, Abhishek Rajora

PDF

1 Repo

TL;DR

BareBones is a new benchmark that rigorously tests whether vision-language models truly understand geometric shapes, revealing widespread reliance on textures and environmental cues.

Contribution

It introduces a pixel-level silhouette benchmark across multiple datasets, exposing the texture bias in current models and establishing a standard for geometric comprehension evaluation.

Findings

01

Models perform poorly without RGB textures, indicating a reliance on visual shortcuts.

02

The benchmark exposes universal structural blindspots in state-of-the-art VLMs.

03

Performance collapse under RGB deprivation is termed the 'Texture Bias Cliff'.

Abstract

While Vision-Language Models (VLMs) demonstrate remarkable zero-shot recognition capabilities across a diverse spectrum of multimodal tasks, it yet remains an open question whether these architectures genuinely comprehend geometric structure or merely exploit RGB textures and contextual priors as statistical shortcuts. Existing evaluations fail to isolate this mechanism, conflating semantic reasoning with texture mapping and relying on imprecise annotations that inadvertently leak environmental cues. To address this gap, we introduce $BareBones$ , a zero-shot benchmark designed to stress-test pure geometric shape comprehension. We curate pixel-level silhouettes of geometrically distinct classes across six datasets: five established segmentation sources (ImageNet-S, DIS5K, ThinObject5K, PASCAL VOC, CUB-200) and our novel flagship collection, WTP-Bench, establishing a noise-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://eternal-f1ame.github.io/WTP-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.