Discovering the Hidden Vocabulary of DALLE-2

Giannis Daras; Alexandros G. Dimakis

arXiv:2206.00169·cs.LG·June 2, 2022·22 cites

Discovering the Hidden Vocabulary of DALLE-2

Giannis Daras, Alexandros G. Dimakis

PDF

Open Access 1 Video

TL;DR

This paper uncovers a hidden vocabulary within DALLE-2 that enables the generation of images from absurd prompts, revealing potential security and interpretability issues.

Contribution

It introduces a black-box method to identify seemingly random prompts that correspond to visual concepts in DALLE-2.

Findings

01

DALLE-2 has a consistent hidden vocabulary for certain prompts

02

Some absurd prompts reliably generate specific visual concepts

03

The discovery raises security and interpretability concerns

Abstract

We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that \texttt{Apoploe vesrreaitais} means birds and \texttt{Contarra ccetnxniams luryca tanniounons} (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DALLE-2 has a secret language!? | Theories and explanations· youtube

Taxonomy

TopicsDigital Media Forensic Detection