VideoMatGen: PBR Materials through Joint Generative Modeling
Jon Hasselgren, Zheng Zeng, Milos Hasan, Jacob Munkberg

TL;DR
VideoMatGen is a novel method that uses a video diffusion transformer architecture to generate physically-based 3D materials conditioned on geometry and text, modeling multiple properties jointly for realistic results.
Contribution
It introduces a joint generative model for multiple material properties and a custom variational auto-encoder to encode these modalities efficiently.
Findings
Generates high-quality, physically plausible materials for 3D shapes.
Joint modeling of multiple material properties improves realism.
Compatible with standard content creation tools.
Abstract
We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material properties (base color, roughness, metallicity, height map) to form physically plausible materials. We further introduce a custom variational auto-encoder which encodes multiple material modalities into a compact latent space, which enables joint generation of multiple modalities without increasing the number of tokens. Our pipeline generates high-quality materials for 3D shapes given a text prompt, compatible with common content creation tools.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
