Explaining the Contributing Factors for Vulnerability Detection in   Machine Learning

Esma Mouine; Yan Liu; Lu Xiao; Rick Kazman; Xiao Wang

arXiv:2406.03577·cs.SE·June 7, 2024

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning

Esma Mouine, Yan Liu, Lu Xiao, Rick Kazman, Xiao Wang

PDF

Open Access

TL;DR

This paper investigates how different features and machine learning models affect vulnerability detection accuracy across multiple software projects, highlighting effective combinations and transferability limitations.

Contribution

It systematically evaluates the impact of various vulnerability features and models, providing a baseline for future research and practical applications.

Findings

01

Bag-of-words with random forest improves detection accuracy by 4%.

02

Transferability of vulnerability signatures across projects is limited.

03

NLP-based code features enhance vulnerability detection.

Abstract

There is an increasing trend to mine vulnerabilities from software repositories and use machine learning techniques to automatically detect software vulnerabilities. A fundamental but unresolved research question is: how do different factors in the mining and learning process impact the accuracy of identifying vulnerabilities in software projects of varying characteristics? Substantial research has been dedicated in this area, including source code static analysis, software repository mining, and NLP-based machine learning. However, practitioners lack experience regarding the key factors for building a baseline model of the state-of-the-art. In addition, there lacks of experience regarding the transferability of the vulnerability signatures from project to project. This study investigates how the combination of different vulnerability features and three representative machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training