How Quantization Impacts Privacy Risk on LLMs for Code?
Md Nazmul Haque, Hua Yang, Zhou Yang, Bowen Xu

TL;DR
This study empirically examines how quantization affects privacy risks and task performance in large language models for code, revealing that quantization can reduce privacy risks and that a tradeoff exists between performance and privacy.
Contribution
First empirical analysis of the impact of quantization on privacy risk and task performance in LLMs for code across multiple architectures and sizes.
Findings
Quantization reduces privacy risks compared to full-precision models.
A positive correlation exists between task performance and privacy risk.
Quantizing larger models can offer a better privacy-performance balance.
Abstract
Large language models for code (LLMs4Code) rely heavily on massive training data, including sensitive data, such as cloud service credentials of the projects and personal identifiable information of the developers, raising serious privacy concerns. Membership inference (MI) has recently emerged as an effective tool for assessing privacy risk by identifying whether specific data belong to a model's training set. In parallel, model compression techniques, especially quantization, have gained traction for reducing computational costs and enabling the deployment of large models. However, while quantized models still retain knowledge learned from the original training data, it remains unclear whether quantization affects their ability to retain and expose privacy information. Answering this question is of great importance to understanding privacy risks in real-world deployments. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Scientific Computing and Data Management
