Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion   Models

Nithin Gopalakrishnan Nair; Wele Gedara Chaminda Bandara; Vishal M.; Patel

arXiv:2212.00793·cs.CV·April 21, 2023

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M., Patel

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible, plug-and-play diffusion model framework that synthesizes images satisfying multiple constraints without retraining, leveraging closed-form solutions and a novel reliability parameter for multi-modal content creation.

Contribution

It proposes a novel sampling strategy for combining multiple diffusion models trained on different tasks, enabling multi-modal synthesis without retraining or paired data.

Findings

01

Effective multi-constraint image generation demonstrated

02

Outperforms existing methods on standard multimodal tasks

03

Flexible use of off-the-shelf diffusion models during sampling

Abstract

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Nithin-GK/UniteandConquer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion