Automated Vulnerability Detection in Source Code Using Deep Representation Learning
C. Seas, G. Fitzpatrick, J. A. Hamilton, M. C. Carlisle

TL;DR
This paper introduces a CNN-based model for detecting bugs in C source code, trained on diverse datasets, achieving high recall and low false positives, thus improving automated vulnerability detection.
Contribution
The work develops a specialized CNN model for C code vulnerability detection, utilizing dual datasets and optimized tokenization, with improved recall and low false positives compared to prior methods.
Findings
Higher recall than previous work on benchmark datasets
Effective detection of real vulnerabilities in complex code
Low false-positive rate in vulnerability identification
Abstract
Each year, software vulnerabilities are discovered, which pose significant risks of exploitation and system compromise. We present a convolutional neural network model that can successfully identify bugs in C code. We trained our model using two complementary datasets: a machine-labeled dataset created by Draper Labs using three static analyzers and the NIST SATE Juliet human-labeled dataset designed for testing static analyzers. In contrast with the work of Russell et al. on these datasets, we focus on C programs, enabling us to specialize and optimize our detection techniques for this language. After removing duplicates from the dataset, we tokenize the input into 91 token categories. The category values are converted to a binary vector to save memory. Our first convolution layer is chosen so that the entire encoding of the token is presented to the filter. We use two convolution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques
