Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

TaeHo Yoon; Kibeom Myoung; Keon Lee; Jaewoong Cho; Albert No; Ernest; K. Ryu

arXiv:2307.02770·cs.CV·November 1, 2023

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest, K. Ryu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method for censoring undesirable outputs in pre-trained diffusion models using minimal human feedback, achieving efficient and effective control over image generation quality.

Contribution

The authors propose a novel censorship technique for diffusion models that requires only a few minutes of human feedback, significantly reducing labeling effort.

Findings

01

Censored diffusion models prevent undesirable image outputs.

02

Minimal human feedback (a few minutes) suffices for effective censorship.

03

The method is highly efficient in human feedback utilization.

Abstract

Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tetrzim/diffusion-human-feedback
pytorchOfficial

Videos

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning