Self-Discovering Interpretable Diffusion Latent Directions for   Responsible Text-to-Image Generation

Hang Li; Chengzhi Shen; Philip Torr; Volker Tresp; Jindong Gu

arXiv:2311.17216·cs.CV·March 29, 2024·1 cites

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised method to discover interpretable latent directions in diffusion models, enabling better understanding and mitigation of biased or harmful content in text-to-image generation.

Contribution

It proposes a novel self-supervised approach to identify latent directions for arbitrary concepts, including inappropriate ones, and uses this for responsible content mitigation.

Findings

01

Effective in fair and safe image generation

02

Able to discover latent directions for arbitrary concepts

03

Improves responsible text-to-image synthesis

Abstract

Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model's internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hangligit/InterpretDiffusion
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion