From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection

Chaomeng Lu; Bert Lagaisse

arXiv:2512.10485·cs.CR·December 12, 2025

From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection

Chaomeng Lu, Bert Lagaisse

PDF

Open Access

TL;DR

This paper critically assesses the real-world effectiveness of deep learning and large language models in vulnerability detection, revealing significant performance gaps and emphasizing the need for more robust models and datasets.

Contribution

It provides a systematic evaluation of DL models and LLMs on real-world vulnerability datasets, highlighting their limitations outside benchmark environments.

Findings

01

Models struggle to differentiate vulnerable from non-vulnerable code in representation space.

02

Performance drops significantly on out-of-distribution real-world datasets.

03

Current models have limited generalization across datasets with different distributions.

Abstract

Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggests that both graph neural network (GNN)-based and transformer-based models, including large language models (LLMs), yield promising results when evaluated on curated benchmark datasets. These datasets are typically characterized by consistent data distributions and heuristic or partially noisy labels. In this study, we systematically evaluate two representative DL models-ReVeal and LineVul-across four representative datasets: Juliet, Devign, BigVul, and ICVul. Each model is trained independently on each respective dataset, and their code representations are analyzed using t-SNE to uncover vulnerability related patterns. To assess realistic applicability, we deploy these models along with four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Information and Cyber Security