Are We Learning the Right Features? A Framework for Evaluating DL-Based Software Vulnerability Detection Solutions
Satyaki Das, Syeda Tasnim Fabiha, Saad Shafiq, Nenad Medvidovic

TL;DR
This paper introduces a novel framework for evaluating deep learning-based software vulnerability detectors by analyzing features, using perturbations, and assessing their influence on prediction accuracy, highlighting issues with spurious correlations.
Contribution
It proposes a uniform feature representation and a perturbation-based evaluation framework to assess vulnerability detectors' reliance on true versus spurious features.
Findings
Only ~2% of feature-preserving perturbations change predictions.
~84% of feature-eliminating perturbations retain vulnerability predictions.
Spurious features significantly impact recall in graph-based detectors.
Abstract
Recent research has revealed that the reported results of an emerging body of DL-based techniques for detecting software vulnerabilities are not reproducible, either across different datasets or on unseen samples. This paper aims to provide the foundation for properly evaluating the research in this domain. We do so by analyzing prior work and existing vulnerability datasets for the syntactic and semantic features of code that contribute to vulnerability, as well as features that falsely correlate with vulnerability. We provide a novel, uniform representation to capture both sets of features, and use this representation to detect the presence of both vulnerability and spurious features in code. To this end, we design two types of code perturbations: feature preserving perturbations (FPP) ensure that the vulnerability feature remains in a given code sample, while feature eliminating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Software Engineering Research · Software System Performance and Reliability
