Revisiting Vulnerability Patch Identification on Data in the Wild

Ivana Clairine Irsan; Ratnadira Widyasari; Ting Zhang; Huihui Huang; Ferdian Thung; Yikun Li; Lwin Khin Shar; Eng Lieh Ouh; Hong Jin Kang; David Lo

arXiv:2603.17266·cs.SE·March 19, 2026

Revisiting Vulnerability Patch Identification on Data in the Wild

Ivana Clairine Irsan, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Ferdian Thung, Yikun Li, Lwin Khin Shar, Eng Lieh Ouh, Hong Jin Kang, David Lo

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of security patch detection models trained on NVD data when applied in real-world scenarios, revealing significant performance drops and proposing a combined dataset approach for improvement.

Contribution

It demonstrates the limitations of NVD-based training data for in-the-wild patch detection and suggests a hybrid dataset method to enhance model robustness.

Findings

01

Models trained on NVD data perform poorly on in-the-wild patches.

02

NVD-linked patches differ significantly from in-the-wild patches in message and content.

03

Combining NVD data with manually identified patches improves detection robustness.

Abstract

Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Web Application Security Vulnerabilities · Cybercrime and Law Enforcement Studies