Taming Feed-forward Reconstruction Models as Latent Encoders for 3D Generative Models
Suttisak Wizadwongsa, Jinfan Zhou, Edward Li, Jeong Joon Park

TL;DR
This paper introduces a method to use pre-trained 3D reconstruction models as latent encoders for 3D generative models, combining efficiency with high-quality, text-conditioned 3D content creation.
Contribution
It proposes a novel framework that reuses existing reconstruction models as latent encoders, with post-processing and a transformer-based architecture for scalable, high-quality 3D generation.
Findings
Enables efficient 3D generative modeling without training new encoders.
Achieves state-of-the-art text-to-3D generation quality.
Demonstrates high scalability and computational efficiency.
Abstract
Recent AI-based 3D content creation has largely evolved along two paths: feed-forward image-to-3D reconstruction approaches and 3D generative models trained with 2D or 3D supervision. In this work, we show that existing feed-forward reconstruction methods can serve as effective latent encoders for training 3D generative models, thereby bridging these two paradigms. By reusing powerful pre-trained reconstruction models, we avoid computationally expensive encoder network training and obtain rich 3D latent features for generative modeling for free. However, the latent spaces of reconstruction models are not well-suited for generative modeling due to their unstructured nature. To enable flow-based model training on these latent features, we develop post-processing pipelines, including protocols to standardize the features and spatial weighting to concentrate on important regions. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · 3D Modeling in Geospatial Applications
