Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models

Melih Catal; Pooja Rani; Harald C. Gall

arXiv:2512.11482·cs.SE·December 15, 2025

Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models

Melih Catal, Pooja Rani, Harald C. Gall

PDF

Open Access

TL;DR

This paper explores applying Differential Privacy to code language models to reduce memorization risks while maintaining their code generation performance, making privacy-preserving deployment feasible.

Contribution

It is the first comprehensive study evaluating the effectiveness of Differential Privacy in mitigating memorization in CodeLLMs without significant utility loss.

Findings

01

DP substantially reduces memorization across snippet types

02

DP slightly increases perplexity but preserves or enhances code generation

03

DP does not significantly impact training time or energy consumption

Abstract

Large language models specialized for code (CodeLLMs) have demonstrated remarkable capabilities in generating code snippets, documentation, and test cases. However, despite their promising capabilities, CodeLLMs can inadvertently memorize and reproduce snippets from their training data, which poses risks of privacy breaches and intellectual property violations. These risks restrict the deployment of CodeLLMs in sensitive domains and limit their training datasets to publicly available sources. To mitigate the memorization risk without compromising their task performance, we apply Differential Privacy (DP) to CodeLLMs. To the best of our knowledge, this is the first comprehensive study that systematically evaluates the effectiveness of DP in CodeLLMs. DP adds calibrated noise to the training process to protect individual data points while still allowing the model to learn useful patterns.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research