REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes
Chaozheng Wang, Zongjie Li, Yun Peng, Shuzheng Gao, Sirong Chen, Shuai, Wang, Cuiyun Gao, Michael R. Lyu

TL;DR
This paper introduces REEF, a comprehensive framework for collecting and analyzing real-world software vulnerabilities and fixes across multiple languages, improving dataset quality and explanation clarity for automated repair research.
Contribution
The paper presents REEF, a multi-language crawler and quality filtering metrics for high-quality vulnerability-fix datasets, along with a neural model for generating detailed vulnerability explanations.
Findings
Collected 4,466 CVEs and 30,987 patches across 7 languages.
Produced high-quality explanations validated by human experts.
Dataset surpasses existing benchmarks in scale, coverage, and quality.
Abstract
Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques
