Automatic feature learning for vulnerability prediction

Hoa Khanh Dam; Truyen Tran; Trang Pham; Shien Wee Ng; John Grundy and; Aditya Ghose

arXiv:1708.02368·cs.SE·August 9, 2017·90 cites

Automatic feature learning for vulnerability prediction

Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy and, Aditya Ghose

PDF

Open Access

TL;DR

This paper introduces a deep learning approach using LSTM models to automatically learn semantic and syntactic features in source code for vulnerability prediction, outperforming existing methods.

Contribution

It presents a novel deep learning-based feature learning method that captures code semantics and syntax for improved vulnerability prediction accuracy.

Findings

01

Achieves 3%-58% improvement in within-project prediction

02

Achieves 85% improvement in cross-project prediction

03

Outperforms state-of-the-art vulnerability prediction models

Abstract

Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of code tokens) that represent the characteristics of the code. However, all suffer from challenges in sufficiently capturing both semantic and syntactic representation of source code, an important capability for building accurate prediction models. In this paper, we describe a new approach, built upon the powerful deep learning Long Short Term Memory model, to automatically learn both semantic and syntactic features in code. Our evaluation on 18 Android applications demonstrates that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research