From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection
Chaomeng Lu, Bert Lagaisse

TL;DR
This paper critically assesses the real-world effectiveness of deep learning and large language models in vulnerability detection, revealing significant performance gaps and emphasizing the need for more robust models and datasets.
Contribution
It provides a systematic evaluation of DL models and LLMs on real-world vulnerability datasets, highlighting their limitations outside benchmark environments.
Findings
Models struggle to differentiate vulnerable from non-vulnerable code in representation space.
Performance drops significantly on out-of-distribution real-world datasets.
Current models have limited generalization across datasets with different distributions.
Abstract
Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggests that both graph neural network (GNN)-based and transformer-based models, including large language models (LLMs), yield promising results when evaluated on curated benchmark datasets. These datasets are typically characterized by consistent data distributions and heuristic or partially noisy labels. In this study, we systematically evaluate two representative DL models-ReVeal and LineVul-across four representative datasets: Juliet, Devign, BigVul, and ICVul. Each model is trained independently on each respective dataset, and their code representations are analyzed using t-SNE to uncover vulnerability related patterns. To assess realistic applicability, we deploy these models along with four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Information and Cyber Security
