Schr\"{o}dinger's Bat: Diffusion Models Sometimes Generate Polysemous   Words in Superposition

Jennifer C. White; Ryan Cotterell

arXiv:2211.13095·cs.CL·November 24, 2022·1 cites

Schr\"{o}dinger's Bat: Diffusion Models Sometimes Generate Polysemous Words in Superposition

Jennifer C. White, Ryan Cotterell

PDF

Open Access 1 Repo

TL;DR

This paper investigates why diffusion models sometimes generate images with multiple meanings of a word, revealing that encodings of polysemous words are stored as superpositions, which can lead to images representing multiple senses simultaneously.

Contribution

It demonstrates that polysemous words are encoded as superpositions in CLIP, and that diffusion models produce images reflecting these superpositions, explaining the homonym duplication phenomenon.

Findings

01

Diffusion models can generate images with multiple word senses from summed encodings.

02

CLIP encodes polysemous words as superpositions of meanings.

03

Linear algebra techniques can manipulate these superpositions to influence generated images.

Abstract

Recent work has shown that despite their impressive capabilities, text-to-image diffusion models such as DALL-E 2 (Ramesh et al., 2022) can display strange behaviours when a prompt contains a word with multiple possible meanings, often generating images containing both senses of the word (Rassin et al., 2022). In this work we seek to put forward a possible explanation of this phenomenon. Using the similar Stable Diffusion model (Rombach et al., 2022), we first show that when given an input that is the sum of encodings of two distinct words, the model can produce an image containing both concepts represented in the sum. We then demonstrate that the CLIP encoder used to encode prompts (Radford et al., 2021) encodes polysemous words as a superposition of meanings, and that using linear algebraic techniques we can edit these representations to influence the senses represented in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rycolab/diffusion-polysemy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling

MethodsDiffusion · Contrastive Language-Image Pre-training