InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity
Haoming Wang, Qiyao Xue, Wei Gao

TL;DR
InfiniBench is a versatile benchmark generator that creates an infinite variety of customizable 3D scenes for evaluating visual spatial reasoning in vision-language models, addressing limitations of existing benchmarks.
Contribution
It introduces a fully automated, customizable framework that synthesizes diverse 3D scenes with controlled complexity using innovative scene generation and rendering techniques.
Findings
Outperforms existing methods in prompt fidelity and physical plausibility.
Effectively generates high-complexity scenes for spatial reasoning tasks.
Demonstrates utility across multiple spatial reasoning benchmarks.
Abstract
Modern vision-language models (VLMs) are expected to have abilities of spatial reasoning with diverse scene complexities, but evaluating such abilities is difficult due to the lack of benchmarks that are not only diverse and scalable but also fully customizable. Existing benchmarks offer limited customizability over the scene complexity and are incapable of isolating and analyzing specific VLM failure modes under distinct spatial conditions. To address this gap, instead of individually presenting benchmarks for different scene complexities, in this paper we present InfiniBench, a fully automated, customizable and user-friendly benchmark generator that can synthesize a theoretically infinite variety of 3D scenes with parameterized control on scene complexity. InfiniBench uniquely translates scene descriptions in natural language into photo-realistic videos with complex and physically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
