Protecting Private Code in IDE Autocomplete using Differential Privacy

Evgeny Grigorenko; David Stanojevi\'c; David Ili\'c; Egor Bogomolov; Kostadin Cvejoski

arXiv:2601.22935·cs.CR·February 2, 2026

Protecting Private Code in IDE Autocomplete using Differential Privacy

Evgeny Grigorenko, David Stanojevi\'c, David Ili\'c, Egor Bogomolov, Kostadin Cvejoski

PDF

Open Access

TL;DR

This paper explores using Differential Privacy to train code autocomplete models in IDEs, significantly enhancing privacy protections while maintaining high utility, thus enabling trustworthy AI-powered development tools.

Contribution

It demonstrates that Differential Privacy can effectively defend against membership inference attacks in code models with minimal utility loss.

Findings

01

DP reduces attack success rate close to random guessing

02

Model utility remains high with DP even on 100x less data

03

DP-trained model performs comparably to non-private models

Abstract

Modern Integrated Development Environments (IDEs) increasingly leverage Large Language Models (LLMs) to provide advanced features like code autocomplete. While powerful, training these models on user-written code introduces significant privacy risks, making the models themselves a new type of data vulnerability. Malicious actors can exploit this by launching attacks to reconstruct sensitive training data or infer whether a specific code snippet was used for training. This paper investigates the use of Differential Privacy (DP) as a robust defense mechanism for training an LLM for Kotlin code completion. We fine-tune a \texttt{Mellum} model using DP and conduct a comprehensive evaluation of its privacy and utility. Our results demonstrate that DP provides a strong defense against Membership Inference Attacks (MIAs), reducing the attack's success rate close to a random guess (AUC from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Engineering Research · Advanced Malware Detection Techniques