Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models
Melih Catal, Pooja Rani, Harald C. Gall

TL;DR
This paper explores applying Differential Privacy to code language models to reduce memorization risks while maintaining their code generation performance, making privacy-preserving deployment feasible.
Contribution
It is the first comprehensive study evaluating the effectiveness of Differential Privacy in mitigating memorization in CodeLLMs without significant utility loss.
Findings
DP substantially reduces memorization across snippet types
DP slightly increases perplexity but preserves or enhances code generation
DP does not significantly impact training time or energy consumption
Abstract
Large language models specialized for code (CodeLLMs) have demonstrated remarkable capabilities in generating code snippets, documentation, and test cases. However, despite their promising capabilities, CodeLLMs can inadvertently memorize and reproduce snippets from their training data, which poses risks of privacy breaches and intellectual property violations. These risks restrict the deployment of CodeLLMs in sensitive domains and limit their training datasets to publicly available sources. To mitigate the memorization risk without compromising their task performance, we apply Differential Privacy (DP) to CodeLLMs. To the best of our knowledge, this is the first comprehensive study that systematically evaluates the effectiveness of DP in CodeLLMs. DP adds calibrated noise to the training process to protect individual data points while still allowing the model to learn useful patterns.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
