SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

TL;DR
SAM3D introduces a method to generate detailed 3D scene segmentations by projecting 2D masks from RGB images into 3D point clouds and merging them iteratively, without additional training.
Contribution
It leverages the Segment-Anything Model for 3D segmentation without training, using a novel projection and merging approach for 3D scenes.
Findings
Achieves fine-grained 3D segmentation on ScanNet dataset
Operates without any training or finetuning of SAM
Provides reasonable qualitative segmentation results
Abstract
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning. For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D masks iteratively with a bottom-up merging approach. At each step, we merge the point cloud masks of two adjacent frames with the bidirectional merging approach. In this way, the 3D masks predicted from different frames are gradually merged into the 3D masks of the whole 3D scene. Finally, we can optionally ensemble the result from our SAM3D with the over-segmentation results based on the geometric information of the 3D scenes. Our approach is experimented with ScanNet dataset and qualitative results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · 3D Shape Modeling and Analysis · Advanced Neural Network Applications
MethodsSegment Anything Model
