VideoMatGen: PBR Materials through Joint Generative Modeling

Jon Hasselgren; Zheng Zeng; Milos Hasan; Jacob Munkberg

arXiv:2603.16566·cs.CV·March 18, 2026

VideoMatGen: PBR Materials through Joint Generative Modeling

Jon Hasselgren, Zheng Zeng, Milos Hasan, Jacob Munkberg

PDF

Open Access

TL;DR

VideoMatGen is a novel method that uses a video diffusion transformer architecture to generate physically-based 3D materials conditioned on geometry and text, modeling multiple properties jointly for realistic results.

Contribution

It introduces a joint generative model for multiple material properties and a custom variational auto-encoder to encode these modalities efficiently.

Findings

01

Generates high-quality, physically plausible materials for 3D shapes.

02

Joint modeling of multiple material properties improves realism.

03

Compatible with standard content creation tools.

Abstract

We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material properties (base color, roughness, metallicity, height map) to form physically plausible materials. We further introduce a custom variational auto-encoder which encodes multiple material modalities into a compact latent space, which enables joint generation of multiple modalities without increasing the number of tokens. Our pipeline generates high-quality materials for 3D shapes given a text prompt, compatible with common content creation tools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques