Tri-Bench: Stress-Testing VLM Reliability on Spatial Reasoning under Camera Tilt and Object Interference
Amit Bendkhale

TL;DR
Tri-Bench is a new benchmark for testing vision-language models' geometric reasoning under challenging conditions like camera tilt and object interference, revealing significant reliability issues.
Contribution
We introduce Tri-Bench, a focused geometric reasoning benchmark that isolates key factors affecting VLM reliability, and evaluate four models revealing their limitations.
Findings
VLM accuracy is modest (~69%) on geometric reasoning tasks.
Models fail to recognize minority shape classes (~0% accuracy).
Camera tilt reduces accuracy by ~4.1%, object interference has negligible effect.
Abstract
Verifiable geometric reasoning is a critical component for trustworthy and controllable agentic AI. Despite impressive capabilities, Vision-Language Models (VLMs) often fail under realistic scene changes. We present Tri-Bench, a compact benchmark of planar triangle problems that isolates relative geometric reasoning while stressing two deployment-critical factors: camera pose (planar vs. tilted) and scene context via object interference (10 everyday objects). To test verifiability and control, we evaluate four recent VLMs using a single, fixed prompt whose guardrail explicitly describes a surrounding square border, enabling correct answers via homography. We evaluate six simple tasks over binary and continuous targets, and observe that the overall accuracy with respect to 3D ground truth is modest, ~69% on average (best ~75%, worst ~64%). The same responses align even more closely with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
