Atlas Gaussians Diffusion for 3D Generation

Haitao Yang; Yuan Dong; Hanwen Jiang; Dejia Xu; Georgios Pavlakos,; Qixing Huang

arXiv:2408.13055·cs.CV·April 10, 2025

Atlas Gaussians Diffusion for 3D Generation

Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos,, Qixing Huang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Atlas Gaussians, a novel 3D shape representation that enables high-quality, efficient, and detailed 3D generation by combining local patch-based modeling with latent diffusion techniques.

Contribution

The paper proposes Atlas Gaussians, a new patch-based 3D representation that improves fidelity and efficiency in 3D generation using diffusion models.

Findings

01

Outperforms prior feed-forward 3D generation methods.

02

Enables high-detail 3D shape synthesis.

03

Efficient patch-level decoding with local awareness.

Abstract

Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables the generation of high-quality details. Moreover, due to local awareness of the…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 3

Strengths

- The paper is well-written and organized; hence it is easy for readers to follow. - Addressing the current challenges of VAEs in extracting efficient latent representations for generative diffusion models is important. - The introduction of the Atlas Gaussians representation with disentangled learning of geometry and appearance features of 3D shapes is novel.

Weaknesses

- The proposed approach includes both the Atlas Gaussian representation with disentangled geometry and appearance learning mechanism, as well as a transformer-based encoder. It is unclear which component contributes more significantly to the overall performance.

Reviewer 02Rating 8Confidence 4

Strengths

1. I appreciate how the introduction section clearly explains and flows through the limitations of recent works, outlining how the authors address these challenges with their proposed method. 2. I appreciate how each section of Section 2: Related Work clearly connects to the proposed method. 3. Sufficient experimental results support the contributions claimed. 4. I value the release of code which enhances transparency and reproducibility.

Weaknesses

1. I suspect the term “atlas” is intended in the same sense as in AtlasNet (L125~128). If so, adding an explanation would be helpful, as it is part of the method’s name but is not clarified in the text. 2. The method section is presented in a list-like format, making it difficult to read smoothly. 3. The clarity of the writing is lacking. Although the text contains many details, the explanations are somewhat vague, requiring readers to infer and interpret a great deal. 4. Some notations lack

Reviewer 03Rating 8Confidence 3

Strengths

The paper nicely integrates 3DGS with diffusion generation scheme and achieves state of the art results in 3D shape generation. The use of Atlas Gaussians to represent 3D shapes as union of local patch makes it capable of capturing better details and higher scalability. Decomposing the 3D shape generation into local Gaussian patches and using UV-based sampling to handle shapes of varying complexity. The evaluation is thorough with reasonable qualitative and quantitative results and achieves S

Weaknesses

While in 4.4 discuss about memory consumption and # of patches, the paper is lacking thorough comparison on inference times (esp. with the increase of # of 3DGS) to further justify the value of the framework. How could the current approach address gap for even finer details of the shapes as Gaussian representation struggles in fine details and often results in blurry effect. Is it only a matter of the # of Gaussians or patches? The author mentioned "We take inspiration from the literature tha

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques

MethodsDiffusion · Latent Diffusion Model