ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software
Xiang Mei, Pulkit Singh Singaria, Jordi Del Castillo, Haoran, Xi, Abdelouahab (Habs) Benchikh, Tiffany Bao, Ruoyu Wang, Yan, Shoshitaishvili, Adam Doup\'e, Hammond Pearce, Brendan Dolan-Gavitt

TL;DR
ARVO is a comprehensive, automatically-updated dataset of over 5,000 reproducible memory vulnerabilities in open-source C/C++ projects, enabling advanced security research and vulnerability analysis.
Contribution
The paper introduces ARVO, a large-scale, reproducible vulnerability dataset with automated updates, improving over prior datasets in size, accuracy, and features for security research.
Findings
ARVO reproduces over 5,000 vulnerabilities across 250+ projects.
It surpasses Google's OSV in locating fixes more accurately.
Demonstrates usefulness in vulnerability repair and zero-day detection case studies.
Abstract
High-quality datasets of real-world vulnerabilities are enormously valuable for downstream research in software security, but existing datasets are typically small, require extensive manual effort to update, and are missing crucial features that such research needs. In this paper, we introduce ARVO: an Atlas of Reproducible Vulnerabilities in Open-source software. By sourcing vulnerabilities from C/C++ projects that Google's OSS-Fuzz discovered and implementing a reliable re-compilation system, we successfully reproduce more than 5,000 memory vulnerabilities across over 250 projects, each with a triggering input, the canonical developer-written patch for fixing the vulnerability, and the ability to automatically rebuild the project from source and run it at its vulnerable and patched revisions. Moreover, our dataset can be automatically updated as OSS-Fuzz finds new vulnerabilities,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Security and Verification in Computing · Advanced Malware Detection Techniques
