Deep Learning based Vulnerability Detection: Are We There Yet?

Saikat Chakraborty; Rahul Krishna; Yangruibo Ding; Baishakhi Ray

arXiv:2009.07235·cs.SE·September 16, 2020

Deep Learning based Vulnerability Detection: Are We There Yet?

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray

PDF

1 Repo 2 Datasets

TL;DR

This paper critically evaluates the performance of deep learning models for vulnerability detection in real-world scenarios, revealing significant performance drops and proposing improved data collection and modeling strategies that enhance detection accuracy.

Contribution

It identifies key issues in existing DL-based vulnerability prediction approaches and demonstrates how principled data and model design can substantially improve results.

Findings

01

Performance drops by over 50% in real-world scenarios

02

Data issues like duplication and unrealistic class distribution affect accuracy

03

Proposed methods improve precision by up to 33.57% and recall by 128.38%

Abstract

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CGCL-codes/VulDeePecker
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.