Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen, Lei, Lei Zhang

TL;DR
This paper introduces Geometry Guided Self-Distillation (GGSD), a novel method that leverages 3D geometric priors to improve open vocabulary 3D scene understanding by distilling knowledge from 2D models and self-distillation.
Contribution
The paper proposes a new GGSD approach that incorporates 3D geometric priors into knowledge distillation, significantly enhancing 3D scene understanding performance over existing methods.
Findings
GGSD outperforms existing methods on benchmark datasets.
Incorporating 3D geometric priors improves knowledge transfer.
Self-distillation further enhances 3D representation quality.
Abstract
The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregarding the exploration of geometric priors and inherent representational advantages offered by 3D data. In this paper, we propose an effective approach, namely Geometry Guided Self-Distillation (GGSD), to learn superior 3D representations from 2D pre-trained models. Specifically, we first design a geometry guided distillation module to distill knowledge from 2D models, and then leverage the 3D geometric priors to alleviate the inherent noise in 2D models and enhance the representation learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
