Automated Code-centric Software Vulnerability Assessment: How Far Are   We? An Empirical Study in C/C++

Anh The Nguyen; Triet Huynh Minh Le; M. Ali Babar

arXiv:2407.17053·cs.SE·August 6, 2024

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

PDF

1 Repo

TL;DR

This empirical study evaluates ML and DL models for function-level software vulnerability assessment in C/C++, finding ML often outperforms DL in efficiency, while multi-task DL enhances accuracy significantly.

Contribution

First comprehensive empirical comparison of ML and DL models for function-level SV assessment in C/C++, highlighting multi-task DL's effectiveness.

Findings

01

ML matches or outperforms DL in performance with less training time

02

Multi-task DL improves assessment accuracy by 8-22% MCC

03

Data-driven practices can effectively guide function-level SV assessment

Abstract

Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anh56/dl4sa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.