Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation
Yilan Gao, Sida Huang, Hongyuan Zhang, and Xuelong Li

TL;DR
WaveGuard is a generator-based framework that applies imperceptible, frequency-aware perturbations to synthetic images, effectively preventing unauthorized knowledge distillation while maintaining visual quality and scalability.
Contribution
The paper introduces WaveGuard, a novel protection method that safeguards synthetic images against model stealing by controlling perturbation imperceptibility and efficiency.
Findings
WaveGuard effectively reduces the usefulness of images for unauthorized training.
It maintains high visual fidelity of protected images.
The method scales efficiently to large output volumes.
Abstract
Closed-weight generative services are increasingly deployed through query-based APIs, where users can obtain generated outputs while model parameters remain inaccessible. However, such deployment does not prevent model stealing: an attacker can repeatedly query the service, collect large volumes of released synthetic images, and use them as training data for a private substitute model. This query-output-driven process enables unauthorized knowledge distillation and capability replication without direct access to the original weights. To mitigate this threat, a practical defense should preserve the visual fidelity of released images, provide explicit control over perturbation magnitude, and scale efficiently to large-volume output release. We present WaveGuard, a single-pass, generator-based protection framework that safeguards released synthetic images under a user-specified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
