Differentially Private Knowledge Distillation via Synthetic Text Generation
James Flemings, Murali Annavaram

TL;DR
This paper introduces DistilDP, a differentially private knowledge distillation method that uses synthetic data from a private teacher LLM to effectively compress models while maintaining privacy and utility.
Contribution
The paper proposes a novel DP knowledge distillation approach leveraging synthetic data and hidden representation alignment, improving utility over existing methods.
Findings
Significant utility improvement over baselines, reducing perplexity by at least 9.0 on Big Patent dataset.
Effective privacy-utility trade-off demonstrated at epsilon=2.
Progress in privacy-preserving compression of autoregressive LLMs.
Abstract
Large Language models (LLMs) are achieving state-of-the-art performance in many different downstream tasks. However, the increasing urgency of data privacy puts pressure on practitioners to train LLMs with Differential Privacy (DP) on private data. Concurrently, the exponential growth in parameter size of LLMs necessitates model compression before deployment of LLMs on resource-constrained devices or latency-sensitive applications. Differential privacy and model compression generally must trade off utility loss to achieve their objectives. Moreover, simultaneously applying both schemes can compound the utility degradation. To this end, we propose DistilDP: a novel differentially private knowledge distillation algorithm that exploits synthetic data generated by a differentially private teacher LLM. The knowledge of a teacher LLM is transferred onto the student in two ways: one way from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsKnowledge Distillation
