PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen, Jen-Hui Chuang, Yen-Yu Lin

TL;DR
PartDistill introduces a cross-modal distillation framework that leverages vision-language models to improve 3D shape part segmentation, addressing challenges like incomplete 2D predictions and knowledge transfer across shapes.
Contribution
It presents a novel bi-directional distillation method that transfers 2D knowledge from VLMs to 3D segmentation models, enhancing accuracy on standard datasets.
Findings
Achieves over 15% higher mIoU on ShapeNetPart
Boosts performance by more than 12% on PartNetE
Effectively utilizes generative models for knowledge transfer
Abstract
This paper proposes a cross-modal distillation framework, PartDistill, which transfers 2D knowledge from vision-language models (VLMs) to facilitate 3D shape part segmentation. PartDistill addresses three major challenges in this task: the lack of 3D segmentation in invisible or undetected regions in the 2D projections, inconsistent 2D predictions by VLMs, and the lack of knowledge accumulation across different 3D shapes. PartDistill consists of a teacher network that uses a VLM to make 2D predictions and a student network that learns from the 2D predictions while extracting geometrical features from multiple 3D shapes to carry out 3D part segmentation. A bi-directional distillation, including forward and backward distillations, is carried out within the framework, where the former forward distills the 2D predictions to the student network, and the latter improves the quality of the 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection
