DC3DO: Diffusion Classifier for 3D Objects
Nursena Koprucu, Meher Shashwat Nigam, Shicheng Xu (Luke), Biruk, Abere, Gabriele Dominici, Andrew Rodriguez, Sharvaree Vadgama, Berfin Inal, and Alberto Tono

TL;DR
DC3DO introduces a generative diffusion model-based approach for 3D object classification that achieves zero-shot recognition and outperforms traditional discriminative methods on ShapeNet data.
Contribution
The paper presents a novel zero-shot 3D object classification method using diffusion models trained on ShapeNet, demonstrating superior performance over multiview approaches.
Findings
Achieves 12.5% improvement over multiview methods.
Enables zero-shot classification without additional training.
Shows potential of generative models for 3D shape recognition.
Abstract
Inspired by Geoffrey Hinton emphasis on generative modeling, To recognize shapes, first learn to generate them, we explore the use of 3D diffusion models for object classification. Leveraging the density estimates from these models, our approach, the Diffusion Classifier for 3D Objects (DC3DO), enables zero-shot classification of 3D shapes without additional training. On average, our method achieves a 12.5 percent improvement compared to its multiview counterparts, demonstrating superior multimodal reasoning over discriminative approaches. DC3DO employs a class-conditional diffusion model trained on ShapeNet, and we run inferences on point clouds of chairs and cars. This work highlights the potential of generative models in 3D object classification.
Peer Reviews
Decision·Submitted to ICLR 2025
The idea of building a 3D diffusion classifier is attractive. The paper compared two methods of combining 3D representation and 2D diffusion classifier. The paper is well-written and clear to follow.
1. Limited number of classes are validated: only “chairs” and “cars” are evaluated for classification performance. Even though the current 3D datasets are relatively smaller than 2D datasets, two categories are insufficient for validating a classifier considering the MVCNN (Su et al., 2015) was validated on 40 classes. 2. Limited comparison baselines: at least MVCCN is closely related to MVDC in this paper and as a frequently mentioned baseline method, can I know why the authors didn't compare
1. The paper is easy to follow, the presentation of the work is good. 2. Figures are clear and easily understandable, and they help reader conceive the main idea the paper is trying to present. 3. Interesting illustrations of certain categories with high and low prediction accuracies of car and chair
1. The biggest problem of this paper is the lack of enough experimentation. The experiments are only conducted on two selected categories (car and chairs) from one dataset (ShapeNet), which is definitely not enough for a paper at this conference. Also, not enough baselines are considered, and the paper only compares to baseline MVDC, which is not a recent work. The current experiment results cannot support the claim of the paper. 2. The paper lacks technical novelties. MVDC seems only extend LI
1. The method is straightforward and easy to understand. 2. Exploring the 3D object understanding and classification seems a worth study topic.
1. The experiment setup, especially for the baseline MVCNN is confusing, and the accuracy for the baseline seems problematic. According to Line 253, the classification is defined as a close-set classification problem that uses the class given the largest P(x|c) as the prediction, however, in line 309, the baseline is evaluated as a binary classification problem (whether belongs to category car or not). Also, for both 3-class classification and binary classification problem the accuracy of Chiar
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Shape Modeling and Analysis · Image Retrieval and Classification Techniques
MethodsDiffusion
