DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based   Vulnerability Detection

Zhonghao Jiang; Weifeng Sun; Xiaoyan Gu; Jiaxin Wu; Tao Wen; Haibo Hu,; Meng Yan

arXiv:2410.18479·cs.SE·October 25, 2024

DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection

Zhonghao Jiang, Weifeng Sun, Xiaoyan Gu, Jiaxin Wu, Tao Wen, Haibo Hu,, Meng Yan

PDF

1 Repo

TL;DR

This paper introduces DFEPT, a data flow embedding method that enhances pre-trained code models for vulnerability detection by capturing structural code relationships, leading to improved accuracy and F1 scores.

Contribution

The paper proposes a novel data flow embedding technique that integrates graph learning with pre-trained models to better detect code vulnerabilities.

Findings

01

Achieves 64.97% accuracy on Devign dataset.

02

Attains 47.9% F1-score on Reveal dataset.

03

Demonstrates improved vulnerability detection performance.

Abstract

Software vulnerabilities represent one of the most pressing threats to computing systems. Identifying vulnerabilities in source code is crucial for protecting user privacy and reducing economic losses. Traditional static analysis tools rely on experts with knowledge in security to manually build rules for operation, a process that requires substantial time and manpower costs and also faces challenges in adapting to new vulnerabilities. The emergence of pre-trained code language models has provided a new solution for automated vulnerability detection. However, code pre-training models are typically based on token-level large-scale pre-training, which hampers their ability to effectively capture the structural and dependency relationships among code segments. In the context of software vulnerabilities, certain types of vulnerabilities are related to the dependency relationships within the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gcvulnerability/dfept
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.