TL;DR
CVEfixes is an automated tool and dataset that collects, organizes, and enriches real-world vulnerability and fix data from open-source software, supporting security research and automated repair.
Contribution
The paper introduces CVEfixes, a fully automated system for collecting and curating a comprehensive vulnerability dataset from CVE records and open-source repositories.
Findings
Dataset includes 5365 CVEs and 5495 fixing commits.
Enriched with code, security metrics, and meta-data.
Supports various security research tasks.
Abstract
Data-driven research on the automated discovery and repair of security vulnerabilities in source code requires comprehensive datasets of real-life vulnerable code and their fixes. To assist in such research, we propose a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures (CVE) records in the public National Vulnerability Database (NVD). We implement our approach in a fully automated dataset collection tool and share an initial release of the resulting vulnerability dataset named CVEfixes. The CVEfixes collection tool automatically fetches all available CVE records from the NVD, gathers the vulnerable code and corresponding fixes from associated open-source repositories, and organizes the collected information in a relational database. Moreover, the dataset is enriched with meta-data such as programming language,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRepair
