Diffusion Models For Multi-Modal Generative Modeling

Changyou Chen; Han Ding; Bunyamin Sisman; Yi Xu; Ouye Xie; Benjamin Z.; Yao; Son Dinh Tran; Belinda Zeng

arXiv:2407.17571·cs.CV·September 26, 2024

Diffusion Models For Multi-Modal Generative Modeling

Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z., Yao, Son Dinh Tran, Belinda Zeng

PDF

Open Access

TL;DR

This paper introduces a unified multi-modal diffusion model that can generate and handle various types of data simultaneously, advancing the capabilities of diffusion-based generative modeling.

Contribution

The paper proposes a novel multi-modal diffusion framework with a shared backbone and modality-specific decoders, enabling multi-task learning and multi-modal data generation.

Findings

01

Effective in image transition and masked-image training

02

Supports joint image-label and image-representation modeling

03

Shows promising results on ImageNet

Abstract

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space. We define the forward diffusion process to be driven by an information aggregation from multiple types of task-data, e.g., images for a generation task and labels for a classification task. In the reverse process, we enforce information sharing by parameterizing a shared backbone denoising network with additional modality-specific decoder heads. Such a structure can simultaneously learn to generate different types of multi-modal data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics

MethodsDiffusion