SAM-Guided Masked Token Prediction for 3D Scene Understanding
Zhimin Chen, Liang Yang, Yingwei Li, Longlong Jing, Bing, Li

TL;DR
This paper introduces a SAM-guided tokenization and masked feature prediction framework to improve 3D scene understanding by better aligning 2D-3D representations and addressing dataset imbalances, achieving state-of-the-art results.
Contribution
The paper proposes a novel SAM-guided tokenization method and a two-stage masked token prediction framework for enhanced 3D scene understanding, addressing alignment and long-tail challenges.
Findings
Significant performance improvements on SUN RGB-D, ScanNet, and S3DIS datasets.
Establishment of new benchmarks in 3D object detection and semantic segmentation.
Effective handling of long-tail distribution in 3D datasets.
Abstract
Foundation models have significantly enhanced 2D task performance, and recent works like Bridge3D have successfully applied these models to improve 3D scene understanding through knowledge distillation, marking considerable advancements. Nonetheless, challenges such as the misalignment between 2D and 3D representations and the persistent long-tail distribution in 3D datasets still restrict the effectiveness of knowledge distillation from 2D to 3D using foundation models. To tackle these issues, we introduce a novel SAM-guided tokenization method that seamlessly aligns 3D transformer structures with region-level knowledge distillation, replacing the traditional KNN-based tokenization techniques. Additionally, we implement a group-balanced re-weighting strategy to effectively address the long-tail problem in knowledge distillation. Furthermore, inspired by the recent success of masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Advanced Neural Network Applications
MethodsKnowledge Distillation
