SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
Dekai Zhu, Yan Di, Stefan Gavranovic, Slobodan Ilic

TL;DR
SeaLion is a novel diffusion model that generates high-quality, diverse 3D point clouds with detailed segmentation labels, advancing 3D shape generation, editing, and evaluation techniques.
Contribution
The paper introduces SeaLion, a diffusion model that jointly predicts point-wise segmentation labels and generates detailed 3D point clouds, along with a new evaluation metric, part-aware Chamfer distance.
Findings
Outperforms state-of-the-art models like DiffFacto in quality and diversity.
Can be trained semi-supervised, reducing labeling effort.
Effective for 3D data augmentation and part-aware shape editing.
Abstract
Denoising diffusion probabilistic models have achieved significant success in point cloud generation, enabling numerous downstream applications, such as generative data augmentation and 3D model editing. However, little attention has been given to generating point clouds with point-wise segmentation labels, as well as to developing evaluation metrics for this task. Therefore, in this paper, we present SeaLion, a novel diffusion model designed to generate high-quality and diverse point clouds with fine-grained segmentation labels. Specifically, we introduce the semantic part-aware latent point diffusion technique, which leverages the intermediate features of the generative models to jointly predict the noise for perturbed latent points and associated part segmentation labels during the denoising process, and subsequently decodes the latent points to point clouds conditioned on part…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The authors attempt to address an interesting challenge by introducing a joint 3D shape and segmentation generation approach, and they propose a new evaluation metric (p-CD) to quantify coherence across parts within generated point clouds. However, these strengths are limited by the lack of sufficient motivation and empirical analysis supporting the proposed methodology.
1. Unclear Motivation for Joint Generation: The choice to couple the generation of 3D shapes with segmentation labels lacks clear motivation. The paper does not address why the segmentation cannot be applied as a post-processing step using a state-of-the-art segmentation model, which could yield similar or better results without added complexity. 2. Probabilistic Mismatch: Since 3D points and segmentation labels may reside in different probability distributions, it is questionable whether combi
- This paper is well-written and easy to follow. The motivation is also clear. - The studied task that is under-explored is interesting. - The performance is good.
- The proposed method seems to only generate the point cloud the class of which exists in the training dataset. So this method can be considered as a data augmentation technique and the authors show its value in Table 5. However, this experiment is not enough and the increase in performance is not significant. Do the authors compare it with other data augmentations? - It seems that we cannot control the class of the generated point cloud, so how to guarantee the class balance of the generated p
- The proposed task is important - The paper is well-written. - The proposed method performs well, with thorough evaluation conducted on two datasets. - The discussion of evaluation metrics is logical, and the newly proposed metric is reasonable (although some questions remain).
1. The newly proposed metric seems to calculate only part alignment between objects, but it does not incorporate overall alignment. Perhaps using it as part of the distance calculation, weighted with the original distance function, would make more sense. Intuitively, it's likely that each part of two objects aligns well individually, but the overall structure of generative object are bad. I would like to hear more discussion on this point. 2. I understand that the focus of the task is on semant
This paper proposes a relatively novel task, using diffusion models to simultaneously generate 3D shapes and segmentation labels. The proposed semantic-aware latent point diffusion technique is indeed an innovative idea, addressing the lack of labeled data and dependency on pre-trained segmentation models in a semi-supervised manner.
I have the following concerns: 1) Is it necessary to have a single metric that evaluates both generation quality and segmentation quality? Why not assess generation quality and segmentation accuracy separately? This would seem more convincing for both tasks. 2) The paper proposes a "semantic part-aware latent point diffusion" but fails to clarify how this is achieved, only vaguely mentioning that segmentation labels are incorporated into the encoder. 3) In line 414 of the paper, it is claimed th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction
