Harnessing Large Language Models for Software Vulnerability Detection: A   Comprehensive Benchmarking Study

Karl Tamberg; Hayretdin Bahsi

arXiv:2405.15614·cs.CR·February 14, 2025·1 cites

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Karl Tamberg, Hayretdin Bahsi

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models in detecting software vulnerabilities, demonstrating they outperform traditional static analysis tools in recall and F1 scores, thus offering a promising new approach.

Contribution

It provides a comprehensive benchmarking of multiple LLMs for vulnerability detection, identifying optimal prompting strategies and comparing their performance to traditional tools.

Findings

01

LLMs detect more vulnerabilities than static analysis tools

02

LLMs outperform in recall and F1 scores

03

Benchmarking identifies best prompting strategies

Abstract

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused by many factors, like lack of awareness, limited efficacy of the existing vulnerability detection tools or the tools not being user-friendly. To help combat some issues with traditional vulnerability detection tools, we propose using large language models (LLMs) to assist in finding vulnerabilities in source code. LLMs have shown a remarkable ability to understand and generate code, underlining their potential in code-related tasks. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies, allowing extraction of the best value from the LLMs. We provide an overview of the strengths and weaknesses of the LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research