Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection

Damian Gnieciak; Tomasz Szandala

arXiv:2508.04448·cs.SE·August 7, 2025

Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection

Damian Gnieciak, Tomasz Szandala

PDF

TL;DR

This paper compares large language models and static code analysis tools for vulnerability detection, showing LLMs have higher recall but more false positives, suggesting a hybrid approach for software security testing.

Contribution

It provides the first systematic benchmark comparing LLMs and static analyzers on real-world vulnerabilities, highlighting their strengths and limitations.

Findings

01

LLMs achieve higher F-1 scores than static tools.

02

Larger recall of LLMs enables broader vulnerability detection.

03

Static tools have fewer false positives and better localization accuracy.

Abstract

Modern software relies on a multitude of automated testing and quality assurance tools to prevent errors, bugs and potential vulnerabilities. This study sets out to provide a head-to-head, quantitative and qualitative evaluation of six automated approaches: three industry-standard rule-based static code-analysis tools (SonarQube, CodeQL and Snyk Code) and three state-of-the-art large language models hosted on the GitHub Models platform (GPT-4.1, Mistral Large and DeepSeek V3). Using a curated suite of ten real-world C# projects that embed 63 vulnerabilities across common categories such as SQL injection, hard-coded secrets and outdated dependencies, we measure classical detection accuracy (precision, recall, F-score), analysis latency, and the developer effort required to vet true positives. The language-based scanners achieve higher mean F-1 scores,0.797, 0.753 and 0.750, than their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.