TL;DR
This paper introduces DFEPT, a data flow embedding method that enhances pre-trained code models for vulnerability detection by capturing structural code relationships, leading to improved accuracy and F1 scores.
Contribution
The paper proposes a novel data flow embedding technique that integrates graph learning with pre-trained models to better detect code vulnerabilities.
Findings
Achieves 64.97% accuracy on Devign dataset.
Attains 47.9% F1-score on Reveal dataset.
Demonstrates improved vulnerability detection performance.
Abstract
Software vulnerabilities represent one of the most pressing threats to computing systems. Identifying vulnerabilities in source code is crucial for protecting user privacy and reducing economic losses. Traditional static analysis tools rely on experts with knowledge in security to manually build rules for operation, a process that requires substantial time and manpower costs and also faces challenges in adapting to new vulnerabilities. The emergence of pre-trained code language models has provided a new solution for automated vulnerability detection. However, code pre-training models are typically based on token-level large-scale pre-training, which hampers their ability to effectively capture the structural and dependency relationships among code segments. In the context of software vulnerabilities, certain types of vulnerabilities are related to the dependency relationships within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
