When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection
Xin-Cheng Wen, Xinchen Wang, Cuiyun Gao, Shaohua Wang, Yang Liu,, Zhaoquan Gu

TL;DR
This paper introduces PILOT, a positive and unlabeled learning model that improves vulnerability detection by effectively leveraging unlabeled data and positive labels, addressing label quality issues in automated datasets.
Contribution
The paper proposes a novel PU learning model, PILOT, specifically designed for vulnerability detection, which outperforms traditional methods by utilizing unlabeled data effectively.
Findings
PILOT achieves higher detection accuracy compared to baseline models.
The label selection module improves pseudo-label quality for unlabeled data.
The mixed-supervision module enhances representation discrimination.
Abstract
Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Advanced Malware Detection Techniques · Software Engineering Research
MethodsFocus
