Learning from what we know: How to perform vulnerability prediction   using noisy historical data

Aayush Garg; Renzo Degiovanni; Matthieu Jimenez; Maxime Cordy; Mike; Papadakis; Yves LeTraon

arXiv:2207.11018·cs.SE·September 20, 2022

Learning from what we know: How to perform vulnerability prediction using noisy historical data

Aayush Garg, Renzo Degiovanni, Matthieu Jimenez, Maxime Cordy, Mike, Papadakis, Yves LeTraon

PDF

Open Access 1 Repo

TL;DR

This paper introduces TROVON, a vulnerability prediction method that learns from known vulnerabilities and their fixes to improve accuracy despite noisy and imbalanced data, outperforming existing techniques.

Contribution

TROVON is a novel approach that leverages known vulnerabilities and their fixes, reducing noise and class imbalance issues in vulnerability prediction models.

Findings

01

TROVON outperforms existing techniques by up to 40.84% in MCC score.

02

It demonstrates significant improvements on Linux Kernel, OpenSSL, and Wireshark datasets.

03

The method is effective under both clean and realistic training data conditions.

Abstract

Vulnerability prediction refers to the problem of identifying system components that are most likely to be vulnerable. Typically, this problem is tackled by training binary classifiers on historical data. Unfortunately, recent research has shown that such approaches underperform due to the following two reasons: a) the imbalanced nature of the problem, and b) the inherently noisy historical data, i.e., most vulnerabilities are discovered much later than they are introduced. This misleads classifiers as they learn to recognize actual vulnerable components as non-vulnerable. To tackle these issues, we propose TROVON, a technique that learns from known vulnerable components rather than from vulnerable and non-vulnerable components, as typically performed. We perform this by contrasting the known vulnerable, and their respective fixed components. This way, TROVON manages to learn from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

garghub/trovon
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Software Engineering Research · Software Reliability and Analysis Research