Mapping NVD Records to Their Vulnerability-fixing Commits: How Hard is It?

Huu Hung Nguyen; Ting Zhang; Duc Manh Tran; Yiran Cheng; Thanh Le-Cong; Hong Jin Kang; Ratnadira Widyasari; Shar Lwin Khin; Ouh Eng Lieh; David Lo

arXiv:2506.09702·cs.SE·May 19, 2026

Mapping NVD Records to Their Vulnerability-fixing Commits: How Hard is It?

Huu Hung Nguyen, Ting Zhang, Duc Manh Tran, Yiran Cheng, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Shar Lwin Khin, Ouh Eng Lieh, David Lo

PDF

1 Repo

TL;DR

This paper evaluates the difficulty of mapping NVD vulnerability records to fixing commits, demonstrating an automated approach with high precision but limited coverage, emphasizing the challenge posed by sparse explicit links.

Contribution

The study presents an empirical analysis and an automated pipeline for mapping NVD records to vulnerability-fixing commits, incorporating external databases and GitHub data to improve coverage.

Findings

01

Automated pipeline achieved 87% precision for Git-based references.

02

External databases contributed additional VFCs with 88.4% and 73% precision.

03

Only 11.3% of NVD records were successfully mapped, highlighting the challenge.

Abstract

Mapping National Vulnerability Database (NVD) records to vulnerability-fixing commits (VFCs) is crucial for vulnerability analysis but challenging due to sparse explicit links in NVD references. This study explores this mapping's feasibility through an empirical approach. Manual analysis of NVD references showed Git references enable over 86% success, while non-Git references achieve under 14%. Using these findings, we built an automated pipeline extracting 31,942 VFCs from 20,360 NVD records (8.7% of 235,341) with 87% precision, mainly from Git references. To fill gaps, we mined six external security databases, yielding 29,254 VFCs for 18,985 records (8.1%) at 88.4% precision, and GitHub repositories, adding 3,686 VFCs for 2,795 records (1.2%) at 73% precision. Combining these, we mapped 26,710 unique records (11.3% coverage) from 7,634 projects, with overlap between NVD and external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hungkien05/vfc-from-nvd-study
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Web Application Security Vulnerabilities · Cybercrime and Law Enforcement Studies