Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang

TL;DR
This paper presents a unified 3D generation framework using Structured LATent (SLAT) representations that enable high-quality, versatile 3D asset creation across multiple formats with flexible editing capabilities.
Contribution
The introduction of SLAT as a unified, multi-format 3D representation combined with a large-scale transformer-based model for high-quality, conditional 3D generation is novel.
Findings
Outperforms existing 3D generation methods in quality and versatility.
Supports multiple output formats including radiance fields, 3D Gaussians, and meshes.
Enables local 3D editing not available in prior models.
Abstract
We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/TRELLIS-image-largemodel· 3.1M dl· ♡ 6313.1M dl♡ 631
- 🤗microsoft/TRELLIS-text-basemodel· 809 dl· ♡ 14809 dl♡ 14
- 🤗microsoft/TRELLIS-text-xlargemodel· 60k dl· ♡ 5960k dl♡ 59
- 🤗kushbhargav/3d_imagemodel
- 🤗cavargas10/TRELLISmodel· 72 dl· ♡ 472 dl♡ 4
- 🤗rhinosaur0/rapid3dgsmodel· ♡ 1♡ 1
- 🤗microsoft/TRELLIS-text-largemodel· 7.0k dl· ♡ 137.0k dl♡ 13
- 🤗cavargas10/TRELLIS-text-xlargemodel· 65 dl65 dl
- 🤗gqk/TRELLIS-image-large-forkmodel· 136k dl· ♡ 2136k dl♡ 2
- 🤗gqk/TRELLIS-text-base-forkmodel· 12 dl12 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction
