Revisiting Pre-trained Language Models for Vulnerability Detection
Youpeng Li, Weiliang Qi, Xuyu Wang, Fuxun Yu, Xinda Wang

TL;DR
This paper critically evaluates 18 pre-trained language models for vulnerability detection in code, highlighting their strengths, limitations, and challenges in real-world scenarios, and emphasizes the need for thorough practical assessments.
Contribution
It provides an extensive evaluation of PLMs on high-quality datasets, compares fine-tuning and prompt engineering, and analyzes robustness and generalizability in vulnerability detection.
Findings
Code-specific pre-training tasks improve PLM performance
PLMs struggle with complex dependencies and perturbations
Limited context windows cause labeling errors
Abstract
The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. While existing empirical studies evaluate PLMs for vulnerability detection (VD), they suffer from data leakage, limited scope, and superficial analysis, hindering the accuracy and comprehensiveness of evaluations. This paper begins by revisiting the common issues in existing research on PLMs for VD through the evaluation pipeline. It then proceeds with an accurate and extensive evaluation of 18 PLMs on high-quality datasets that feature accurate labeling, diverse vulnerability types, and various projects. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection
