Towards Practical Plug-and-Play Diffusion Models
Hyojun Go, Yunsung Lee, Jin-Young Kim, Seunghyun Lee, Myeongho Jeong,, Hyun Seung Lee, and Seungtaek Choi

TL;DR
This paper introduces PPAP, a practical framework for guiding diffusion models using multiple specialized experts and parameter-efficient fine-tuning, enabling plug-and-play control without labeled data.
Contribution
It proposes a novel multi-expert guidance strategy combined with a data-free fine-tuning framework for diffusion models, addressing noise robustness and scalability issues.
Findings
Successful ImageNet class conditional generation with minimal trainable parameters
Guidance of various models like classifiers and segmentation tools in a plug-and-play manner
No labeled data required for effective guidance
Abstract
Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without finetuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single guidance model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
Methodsclassifier-guidance · Guided Language to Image Diffusion for Generation and Editing · Diffusion
