Noisy Label Learning for Security Defects

Roland Croft; M. Ali Babar; Huaming Chen

arXiv:2203.04468·cs.SE·April 4, 2022

Noisy Label Learning for Security Defects

Roland Croft, M. Ali Babar, Huaming Chen

PDF

Open Access

TL;DR

This paper introduces robust noisy label learning methods for security defect prediction, addressing label noise issues in vulnerability datasets to improve predictive performance.

Contribution

It proposes a novel two-stage noise cleaning approach for vulnerability prediction, enhancing model accuracy despite noisy labels.

Findings

01

Improved AUC and recall by up to 8.9% and 23.4% with the proposed method.

02

Demonstrated effectiveness of noisy label learning in security analytics.

03

Discussed challenges in achieving performance upper bounds with label noise.

Abstract

Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introduces labeling noise in the datasets and affects conclusion validity. To address this issue, we propose novel learning methods that are robust to label impurities and can leverage the most from limited label data; noisy label learning. We investigate various noisy label learning methods applied to software vulnerability prediction. Specifically, we propose a two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques