PARSE: Part-Aware Relational Spatial Modeling
Yinuo Bai, Peijun Xu, Kuixiang Shao, Yuyang Jiao, Jingxuan Zhang, Kaixin Yao, Jiayuan Gu, Jingyi Yu

TL;DR
PARSE introduces a part-aware framework for modeling object interactions at the part level, enabling more accurate and physically consistent 3D scene reasoning and generation.
Contribution
It presents a novel part-centric assembly graph and spatial solver, along with a large dataset, to improve geometry-grounded spatial reasoning in 3D scene modeling.
Findings
Enhanced object layout reasoning after fine-tuning Qwen3-VL on PARSE-10K.
Scenes generated with PAGs show improved physical realism and structural complexity.
PARSE advances the state of the art in geometry-grounded spatial reasoning.
Abstract
Inter-object relations underpin spatial intelligence, yet existing representations -- linguistic prepositions or object-level scene graphs -- are too coarse to specify which regions actually support, contain, or contact one another, leading to ambiguous and physically inconsistent layouts. To address these ambiguities, a part-level formulation is needed; therefore, we introduce PARSE, a framework that explicitly models how object parts interact to determine feasible and spatially grounded scene configurations. PARSE centers on the Part-centric Assembly Graph (PAG), which encodes geometric relations between specific object parts, and a Part-Aware Spatial Configuration Solver that converts these relations into geometric constraints to assemble collision-free, physically valid scenes. Using PARSE, we build PARSE-10K, a dataset of 10,000 3D indoor scenes constructed from real-image layout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Robotics and Sensor-Based Localization
