Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X.-F. Ye, Molei Tao

TL;DR
This paper introduces a novel framework for multimodal diffusion models that operate on arbitrary state spaces, allowing native generation of diverse data types without relying on external preprocessing or encoders.
Contribution
The authors propose a new approach for building multimodal diffusion models on arbitrary state spaces with decoupled noise schedules, enabling native, flexible generation across modalities.
Findings
Effective text-image generation demonstrated
Mixed-type tabular data synthesis achieved
Competitive performance with existing methods
Abstract
Diffusion models have demonstrated remarkable performance in generating unimodal data across various tasks, including image, video, and text generation. On the contrary, the joint generation of multimodal data through diffusion models is still in the early stages of exploration. Existing approaches heavily rely on external preprocessing protocols, such as tokenizers and variational autoencoders, to harmonize varied data representations into a unified, unimodal format. This process heavily demands the high accuracy of encoders and decoders, which can be problematic for applications with limited data. To lift this restriction, we propose a novel framework for building multimodal diffusion models on arbitrary state spaces, enabling native generation of coupled data across different modalities. By introducing an innovative decoupled noise schedule for each modality, we enable both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
MethodsDiffusion
