CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Wei Yang, Lan Xu, Jiayuan Gu, Jingyi Yu

TL;DR
CAST is a novel method that reconstructs high-quality 3D scenes from a single RGB image by integrating object segmentation, spatial relationship analysis, occlusion-aware generation, and physics-based correction for realistic and coherent scene modeling.
Contribution
The paper introduces CAST, a comprehensive framework combining GPT-based analysis, large-scale 3D generation, and physics-aware optimization for improved 3D scene reconstruction from a single image.
Findings
Effective occlusion handling with Signed Distance Fields
Accurate object alignment and scene coherence
Enhanced physical realism in reconstructed scenes
Abstract
Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction and recovery. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relationships. This enables the understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object's full geometry, using MAE and point cloud conditioning to mitigate the effects of occlusions and partial object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection · Medical Image Segmentation Techniques
MethodsMasked autoencoder · ALIGN
