Discovering the Hidden Vocabulary of DALLE-2
Giannis Daras, Alexandros G. Dimakis

TL;DR
This paper uncovers a hidden vocabulary within DALLE-2 that enables the generation of images from absurd prompts, revealing potential security and interpretability issues.
Contribution
It introduces a black-box method to identify seemingly random prompts that correspond to visual concepts in DALLE-2.
Findings
DALLE-2 has a consistent hidden vocabulary for certain prompts
Some absurd prompts reliably generate specific visual concepts
The discovery raises security and interpretability concerns
Abstract
We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that \texttt{Apoploe vesrreaitais} means birds and \texttt{Contarra ccetnxniams luryca tanniounons} (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
DALLE-2 has a secret language!? | Theories and explanations· youtube
Taxonomy
TopicsDigital Media Forensic Detection
