Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

Sanghyun Kim; Seohyeon Jung; Balhae Kim; Moonseok Choi; Jinwoo Shin,; Juho Lee

arXiv:2407.21032·cs.CV·August 1, 2024

Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin,, Juho Lee

PDF

Open Access

TL;DR

This paper introduces Human Feedback Inversion (HFI), a framework that uses human feedback to guide the removal of harmful or copyrighted content in text-to-image diffusion models, improving ethical safety.

Contribution

The paper presents a novel HFI framework that leverages human feedback to better align model outputs with human judgments and effectively mitigate problematic content.

Findings

01

Significantly reduces objectionable content generation

02

Preserves image quality while removing harmful concepts

03

Provides a strong baseline for concept removal in diffusion models

Abstract

This paper addresses the societal concerns arising from large-scale text-to-image diffusion models for generating potentially harmful or copyrighted content. Existing models rely heavily on internet-crawled data, wherein problematic concepts persist due to incomplete filtration processes. While previous approaches somewhat alleviate the issue, they often rely on text-specified concepts, introducing challenges in accurately capturing nuanced concepts and aligning model knowledge with human understandings. In response, we propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images. The proposed framework can be built upon existing techniques for the same purpose, enhancing their alignment with human judgment. By doing so, we simplify the training objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNuclear reactor physics and engineering

MethodsDiffusion