Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest, K. Ryu

TL;DR
This paper introduces a method for censoring undesirable outputs in pre-trained diffusion models using minimal human feedback, achieving efficient and effective control over image generation quality.
Contribution
The authors propose a novel censorship technique for diffusion models that requires only a few minutes of human feedback, significantly reducing labeling effort.
Findings
Censored diffusion models prevent undesirable image outputs.
Minimal human feedback (a few minutes) suffices for effective censorship.
The method is highly efficient in human feedback utilization.
Abstract
Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
