TL;DR
OpenSpatial introduces an open-source data engine and a large-scale dataset to advance spatial understanding and reasoning, enabling versatile models to achieve state-of-the-art performance.
Contribution
The paper presents a principled, scalable data engine and a 3 million sample dataset for spatial intelligence, filling a critical gap in open-source tools for spatial data generation.
Findings
Models trained on OpenSpatial-3M outperform previous benchmarks.
The dataset enables significant performance improvements in spatial reasoning tasks.
Analysis reveals how data attributes affect spatial perception.
Abstract
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To bridge this gap, we elucidate the design principles of a robust data generation system and introduce OpenSpatial -- an open-source data engine engineered for high quality, extensive scalability, broad task diversity, and optimized efficiency. OpenSpatial adopts 3D bounding boxes as the fundamental primitive to construct a comprehensive data hierarchy across five foundational tasks: Spatial Measurement (SM), Spatial Relationship (SR), Camera Perception (CP), Multi-view Consistency (MC), and Scene-Aware Reasoning (SAR). Leveraging this scalable infrastructure, we curate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
