Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M., Patel

TL;DR
This paper introduces a flexible, plug-and-play diffusion model framework that synthesizes images satisfying multiple constraints without retraining, leveraging closed-form solutions and a novel reliability parameter for multi-modal content creation.
Contribution
It proposes a novel sampling strategy for combining multiple diffusion models trained on different tasks, enabling multi-modal synthesis without retraining or paired data.
Findings
Effective multi-constraint image generation demonstrated
Outperforms existing methods on standard multimodal tasks
Flexible use of off-the-shelf diffusion models during sampling
Abstract
Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
