Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using   Deep Learning

Alejandro Mazuera-Rozo; Anamaria Mojica-Hanke; Mario; Linares-V\'asquez; Gabriele Bavota

arXiv:2103.11940·cs.SE·March 23, 2021

Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning

Alejandro Mazuera-Rozo, Anamaria Mojica-Hanke, Mario, Linares-V\'asquez, Gabriele Bavota

PDF

TL;DR

This study empirically compares deep learning and shallow machine learning models for software vulnerability detection across multiple datasets and code representations, revealing that shallow models remain competitive and current DL models lack reliability.

Contribution

It provides a large-scale empirical evaluation of DL versus shallow models for vulnerability detection, highlighting the limited effectiveness of DL and the competitiveness of shallow classifiers.

Findings

01

Shallow classifiers perform competitively with DL models.

02

DL models are not yet reliably effective for vulnerability detection.

03

Current models still have significant room for improvement.

Abstract

Deep learning (DL) techniques are on the rise in the software engineering research community. More and more approaches have been developed on top of DL models, also due to the unprecedented amount of software-related data that can be used to train these models. One of the recent applications of DL in the software engineering domain concerns the automatic detection of software vulnerabilities. While several DL models have been developed to approach this problem, there is still limited empirical evidence concerning their actual effectiveness especially when compared with shallow machine learning techniques. In this paper, we partially fill this gap by presenting a large-scale empirical study using three vulnerability datasets and five different source code representations (i.e., the format in which the code is provided to the classifiers to assess whether it is vulnerable or not) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.