Protecting Private Code in IDE Autocomplete using Differential Privacy
Evgeny Grigorenko, David Stanojevi\'c, David Ili\'c, Egor Bogomolov, Kostadin Cvejoski

TL;DR
This paper explores using Differential Privacy to train code autocomplete models in IDEs, significantly enhancing privacy protections while maintaining high utility, thus enabling trustworthy AI-powered development tools.
Contribution
It demonstrates that Differential Privacy can effectively defend against membership inference attacks in code models with minimal utility loss.
Findings
DP reduces attack success rate close to random guessing
Model utility remains high with DP even on 100x less data
DP-trained model performs comparably to non-private models
Abstract
Modern Integrated Development Environments (IDEs) increasingly leverage Large Language Models (LLMs) to provide advanced features like code autocomplete. While powerful, training these models on user-written code introduces significant privacy risks, making the models themselves a new type of data vulnerability. Malicious actors can exploit this by launching attacks to reconstruct sensitive training data or infer whether a specific code snippet was used for training. This paper investigates the use of Differential Privacy (DP) as a robust defense mechanism for training an LLM for Kotlin code completion. We fine-tune a \texttt{Mellum} model using DP and conduct a comprehensive evaluation of its privacy and utility. Our results demonstrate that DP provides a strong defense against Membership Inference Attacks (MIAs), reducing the attack's success rate close to a random guess (AUC from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Engineering Research · Advanced Malware Detection Techniques
