DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D   Generation

Ziang Cao; Fangzhou Hong; Tong Wu; Liang Pan; Ziwei Liu

arXiv:2405.08055·cs.CV·May 15, 2024

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

DiffTF++ is a novel 3D-aware diffusion transformer framework that efficiently generates diverse, high-quality 3D assets across many categories by integrating multi-view reconstruction, triplane refinement, and 3D-aware transformers.

Contribution

It introduces DiffTF++, a unified feed-forward model combining improved triplanes, 3D-aware transformers, and multi-view loss for large-vocabulary 3D generation.

Findings

01

Achieves state-of-the-art 3D generation quality and diversity.

02

Effectively handles large-scale, multi-category 3D asset synthesis.

03

Outperforms existing methods on ShapeNet and OmniObject3D datasets.

Abstract

Generating diverse and high-quality 3D assets automatically poses a fundamental yet challenging task in 3D computer vision. Despite extensive efforts in 3D generation, existing optimization-based approaches struggle to produce large-scale 3D assets efficiently. Meanwhile, feed-forward methods often focus on generating only a single category or a few categories, limiting their generalizability. Therefore, we introduce a diffusion-based feed-forward framework to address these challenges with a single model. To handle the large diversity and complexity in geometry and texture across categories efficiently, we 1) adopt improved triplane to guarantee efficiency; 2) introduce the 3D-aware transformer to aggregate the generalized 3D knowledge with specialized 3D features; and 3) devise the 3D-aware encoder/decoder to enhance the generalized 3D knowledge. Building upon our 3D-aware Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziangcao0312/DiffTF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Human Motion and Animation

MethodsFocus · Diffusion