Today's Cat Is Tomorrow's Dog: Accounting for Time-Based Changes in the Labels of ML Vulnerability Detection Approaches
Ranindya Paramitha, Yuan Feng, Fabio Massacci

TL;DR
This paper introduces a method to evaluate ML vulnerability detection models over time by restructuring datasets to reflect changing labels, revealing that most models do not improve with more data as expected.
Contribution
It proposes a novel dataset restructuring approach to account for temporal label changes and evaluates model performance trends over time.
Findings
Models do not consistently improve over time.
Most models are not learning as more data becomes available.
Performance trends vary unpredictably across years.
Abstract
Vulnerability datasets used for ML testing implicitly contain retrospective information. When tested on the field, one can only use the labels available at the time of training and testing (e.g. seen and assumed negatives). As vulnerabilities are discovered across calendar time, labels change and past performance is not necessarily aligned with future performance. Past works only considered the slices of the whole history (e.g. DiverseVUl) or individual differences between releases (e.g. Jimenez et al. ESEC/FSE 2019). Such approaches are either too optimistic in training (e.g. the whole history) or too conservative (e.g. consecutive releases). We propose a method to restructure a dataset into a series of datasets in which both training and testing labels change to account for the knowledge available at the time. If the model is actually learning, it should improve its performance over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Reliability and Analysis Research · Network Security and Intrusion Detection
