TL;DR
This paper evaluates the difficulty of mapping NVD vulnerability records to fixing commits, demonstrating an automated approach with high precision but limited coverage, emphasizing the challenge posed by sparse explicit links.
Contribution
The study presents an empirical analysis and an automated pipeline for mapping NVD records to vulnerability-fixing commits, incorporating external databases and GitHub data to improve coverage.
Findings
Automated pipeline achieved 87% precision for Git-based references.
External databases contributed additional VFCs with 88.4% and 73% precision.
Only 11.3% of NVD records were successfully mapped, highlighting the challenge.
Abstract
Mapping National Vulnerability Database (NVD) records to vulnerability-fixing commits (VFCs) is crucial for vulnerability analysis but challenging due to sparse explicit links in NVD references. This study explores this mapping's feasibility through an empirical approach. Manual analysis of NVD references showed Git references enable over 86% success, while non-Git references achieve under 14%. Using these findings, we built an automated pipeline extracting 31,942 VFCs from 20,360 NVD records (8.7% of 235,341) with 87% precision, mainly from Git references. To fill gaps, we mined six external security databases, yielding 29,254 VFCs for 18,985 records (8.1%) at 88.4% precision, and GitHub repositories, adding 3,686 VFCs for 2,795 records (1.2%) at 73% precision. Combining these, we mapped 26,710 unique records (11.3% coverage) from 7,634 projects, with overlap between NVD and external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Web Application Security Vulnerabilities · Cybercrime and Law Enforcement Studies
