Exposing and Defending Membership Leakage in Vulnerability Prediction Models

Yihan Liao; Jacky Keung; Xiaoxue Ma; Jingyu Zhang; Yicheng Sun

arXiv:2512.08291·cs.CR·December 10, 2025

Exposing and Defending Membership Leakage in Vulnerability Prediction Models

Yihan Liao, Jacky Keung, Xiaoxue Ma, Jingyu Zhang, Yicheng Sun

PDF

Open Access

TL;DR

This paper investigates the privacy risks of membership inference attacks on vulnerability prediction models for code, demonstrating vulnerabilities and proposing a noise-based defense that effectively reduces attack success while maintaining model performance.

Contribution

It provides the first comprehensive analysis of MIA on code vulnerability prediction models and introduces a lightweight noise-based defense mechanism.

Findings

01

Logits and loss are most vulnerable outputs for MIA.

02

The proposed NMID reduces attack AUC from nearly 1.0 to below 0.65.

03

NMID maintains the predictive utility of the models.

Abstract

Neural models for vulnerability prediction (VP) have achieved impressive performance by learning from large-scale code repositories. However, their susceptibility to Membership Inference Attacks (MIAs), where adversaries aim to infer whether a particular code sample was used during training, poses serious privacy concerns. While MIA has been widely investigated in NLP and vision domains, its effects on security-critical code analysis tasks remain underexplored. In this work, we conduct the first comprehensive analysis of MIA on VP models, evaluating the attack success across various architectures (LSTM, BiGRU, and CodeBERT) and feature combinations, including embeddings, logits, loss, and confidence. Our threat model aligns with black-box and gray-box settings where prediction outputs are observable, allowing adversaries to infer membership by analyzing output discrepancies between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques