Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Jaechul Roh; Zachary Novack; Yuefeng Peng; Niloofar Mireshghallah; Taylor Berg-Kirkpatrick; Amir Houmansadr

arXiv:2507.17937·cs.SD·February 27, 2026

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr

PDF

Open Access

TL;DR

This paper uncovers a vulnerability in generative AI for music and video, showing that phonetic patterns can bypass copyright filters, enabling the reproduction of copyrighted content through adversarial prompts.

Contribution

The authors introduce Adversarial PhoneTic Prompting (APT), a novel attack exploiting phonetic memorization to evade content filters in music and video generation models.

Findings

01

APT achieves 91% similarity to original copyrighted content

02

Embedding analysis confirms phonetic structure as a retrieval key

03

APT can reconstruct visual scenes from music videos using altered lyrics

Abstract

Generative AI systems for music and video commonly use text-based filters to prevent regurgitation of copyrighted material. We expose a significant vulnerability in this approach by introducing Adversarial PhoneTic Prompting (APT), a novel attack that bypasses these safeguards by exploiting phonetic memorization--the tendency of models to bind sub-lexical acoustic patterns (phonemes, rhyme, stress, cadence) to memorized copyrighted content. APT replaces iconic lyrics with homophonic but semantically unrelated alternatives (e.g., "mom's spaghetti" becomes "Bob's confetti"), preserving phonetic structure while evading lexical filters. We evaluate APT on leading lyrics-to-song models (Suno, YuE) across English and Korean songs spanning rap, pop, and K-pop. APT achieves 91% average similarity to copyrighted originals, versus 13.7% for random lyrics and 42.2% for semantic paraphrases.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing