Collaborative Multi-Modal Coding for High-Quality 3D Generation

Ziang Cao; Zhaoxi Chen; Liang Pan; Ziwei Liu

arXiv:2508.15228·cs.CV·February 24, 2026

Collaborative Multi-Modal Coding for High-Quality 3D Generation

Ziang Cao, Zhaoxi Chen, Liang Pan, Ziwei Liu

PDF

Open Access

TL;DR

TriMM is a novel 3D generative model that effectively integrates multiple data modalities like RGB and point clouds, leading to high-quality 3D asset creation with limited training data.

Contribution

The paper introduces TriMM, the first feed-forward 3D-native model that collaboratively encodes multi-modal data and employs a triplane diffusion approach for superior 3D generation.

Findings

01

Achieves competitive 3D generation quality with limited data.

02

Successfully incorporates diverse multi-modal datasets.

03

Demonstrates robustness across multiple datasets.

Abstract

3D content inherently encompasses multi-modal characteristics and can be projected into different modalities (e.g., RGB images, RGBD, and point clouds). Each modality exhibits distinct advantages in 3D asset modeling: RGB images contain vivid 3D textures, whereas point clouds define fine-grained 3D geometries. However, most existing 3D-native generative architectures either operate predominantly within single-modality paradigms-thus overlooking the complementary benefits of multi-modality data-or restrict themselves to 3D structures, thereby limiting the scope of available training datasets. To holistically harness multi-modalities for 3D modeling, we present TriMM, the first feed-forward 3D-native generative model that learns from basic multi-modalities (e.g., RGB, RGBD, and point cloud). Specifically, 1) TriMM first introduces collaborative multi-modal coding, which integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques