PartDistill: 3D Shape Part Segmentation by Vision-Language Model   Distillation

Ardian Umam; Cheng-Kun Yang; Min-Hung Chen; Jen-Hui Chuang; Yen-Yu Lin

arXiv:2312.04016·cs.CV·April 17, 2024·1 cites

PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation

Ardian Umam, Cheng-Kun Yang, Min-Hung Chen, Jen-Hui Chuang, Yen-Yu Lin

PDF

Open Access 1 Repo

TL;DR

PartDistill introduces a cross-modal distillation framework that leverages vision-language models to improve 3D shape part segmentation, addressing challenges like incomplete 2D predictions and knowledge transfer across shapes.

Contribution

It presents a novel bi-directional distillation method that transfers 2D knowledge from VLMs to 3D segmentation models, enhancing accuracy on standard datasets.

Findings

01

Achieves over 15% higher mIoU on ShapeNetPart

02

Boosts performance by more than 12% on PartNetE

03

Effectively utilizes generative models for knowledge transfer

Abstract

This paper proposes a cross-modal distillation framework, PartDistill, which transfers 2D knowledge from vision-language models (VLMs) to facilitate 3D shape part segmentation. PartDistill addresses three major challenges in this task: the lack of 3D segmentation in invisible or undetected regions in the 2D projections, inconsistent 2D predictions by VLMs, and the lack of knowledge accumulation across different 3D shapes. PartDistill consists of a teacher network that uses a VLM to make 2D predictions and a student network that learns from the 2D predictions while extracting geometrical features from multiple 3D shapes to carry out 3D part segmentation. A bi-directional distillation, including forward and backward distillations, is carried out within the framework, where the former forward distills the 2D predictions to the student network, and the latter improves the quality of the 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ardianumam/partdistill
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection