Lossless Anti-Distillation Sampling
Zibo Diao, Jingchu Gai, Xinyue Ai, Zhang Zhang, Zhenyu He, Di He

TL;DR
LADS is a novel sampling scheme that prevents multi-account distillation attacks on generative models by ensuring responses are statistically identical for benign users while degrading the quality of distillation data.
Contribution
LADS introduces a privacy-preserving sampling method that maintains response quality for users and reduces distillation effectiveness by correlating harvested data.
Findings
LADS significantly reduces the performance of distilled models across tasks.
LADS maintains identical responses for individual users, ensuring no distortion.
Theoretical analysis shows LADS degrades distiller convergence rates.
Abstract
Frontier commercial generative models face a growing threat from distillation, whereby a distiller harvests generated responses and trains a competing model of its own at drastically lower cost. Existing defenses either rely on modifying the models outputs, thereby sacrificing response quality for benign users, or on behavioral detection methods, which can be readily circumvented by distributing queries across multiple accounts. In this work, we propose Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme specifically designed to counter multi-account distillation while maintaining a lossless experience for benign users. Concretely, LADS derives the randomness underlying each generation from a private seed determined by the semantic content of the query and the number of times the user has queried the model. By construction, every benign user receives a response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
