eyeballvul: a future-proof benchmark for vulnerability detection in the   wild

Timothee Chauvin

arXiv:2407.08708·cs.CR·July 16, 2024

eyeballvul: a future-proof benchmark for vulnerability detection in the wild

Timothee Chauvin

PDF

Open Access 1 Repo

TL;DR

eyeballvul is a comprehensive, regularly updated benchmark designed to evaluate the ability of language models to detect security vulnerabilities in large-scale codebases, reflecting real-world scenarios.

Contribution

The paper introduces eyeballvul, a large-scale, dynamic benchmark for vulnerability detection in code, enabling robust evaluation of LLMs in practical security tasks.

Findings

01

Contains over 24,000 vulnerabilities across 6,000 revisions

02

Updated weekly with new vulnerabilities from open-source repositories

03

Supports large-scale evaluation of language models' security detection capabilities

Abstract

Long contexts of recent LLMs have enabled a new use case: asking models to find security vulnerabilities in entire codebases. To evaluate model performance on this task, we introduce eyeballvul: a benchmark designed to test the vulnerability detection capabilities of language models at scale, that is sourced and updated weekly from the stream of published vulnerabilities in open-source repositories. The benchmark consists of a list of revisions in different repositories, each associated with the list of known vulnerabilities present at that revision. An LLM-based scorer is used to compare the list of possible vulnerabilities returned by a model to the list of known vulnerabilities for each revision. As of July 2024, eyeballvul contains 24,000+ vulnerabilities across 6,000+ revisions and 5,000+ repositories, and is around 55GB in size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

timothee-chauvin/eyeballvul_experiments
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques