Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle   Semantic Representation Learning and Explanation

Chao Ni; Xin Yin; Kaiwen Yang; Dehai Zhao; Zhenchang Xing; Xin Xia

arXiv:2308.11237·cs.SE·August 23, 2023

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, Xin Xia

PDF

Open Access 1 Repo

TL;DR

This paper introduces SVulD, a novel semantic embedding approach for vulnerability detection that improves accuracy and provides developer-friendly explanations, addressing limitations of existing deep learning methods.

Contribution

SVulD is the first method to learn subtle semantic representations for vulnerability detection and generate natural language explanations for developers.

Findings

01

SVulD outperforms SOTA approaches with up to 68% higher F1-score.

02

SVulD achieves significant improvements in PR-AUC and accuracy metrics.

03

User study shows SVulD helps developers understand vulnerabilities better.

Abstract

Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitive developer-oriented explanations to the detected results. In this paper, we propose a novel approach named SVulD, a function-level Subtle semantic embedding for Vulnerability Detection along with intuitive explanations, to alleviate the above limitations. Specifically, SVulD firstly trains a model to learn distinguishing semantic representations of functions regardless of their lexical similarity. Then, for the detected vulnerable functions, SVulD provides natural language explanations (e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacknichao/svuld
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research