VulDetectBench: Evaluating the Deep Capability of Vulnerability   Detection with Large Language Models

Yu Liu; Lang Gao; Mingxin Yang; Yu Xie; Ping Chen; Xiaojin Zhang; Wei; Chen

arXiv:2406.07595·cs.CR·August 22, 2024·1 cites

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei, Chen

PDF

Open Access 1 Repo

TL;DR

VulDetectBench is a new benchmark designed to evaluate the vulnerability detection capabilities of large language models across multiple tasks, revealing strengths in basic detection but weaknesses in detailed vulnerability analysis.

Contribution

The paper introduces VulDetectBench, a comprehensive benchmark for assessing LLMs' ability to detect, classify, and locate code vulnerabilities, filling a gap in specialized vulnerability research.

Findings

01

Models achieve over 80% accuracy in vulnerability identification and classification.

02

Models perform poorly (<30%) on detailed vulnerability analysis tasks.

03

VulDetectBench provides a standardized evaluation framework for future improvements.

Abstract

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sweetaroo/vuldetectbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities