PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source Software
Kaixuan Li, Jian Zhang, Sen Chen, Han Liu, Yang Liu, Yixiang Chen

TL;DR
PatchFinder is a novel two-phase framework that improves security patch tracing for open-source software vulnerabilities by combining lexical and semantic matching with end-to-end correlation learning, significantly enhancing accuracy and reducing manual effort.
Contribution
The paper introduces PatchFinder, a two-phase end-to-end framework that combines hybrid retrieval and supervised re-ranking for more effective security patch tracing in OSS.
Findings
Achieves 80.63% Recall@10 and 0.7951 MRR on 4,789 CVEs.
Reduces manual effort by 1.94 times compared to existing methods.
Successfully identified and confirmed 482 patches in practice.
Abstract
Open-source software (OSS) vulnerabilities are increasingly prevalent, emphasizing the importance of security patches. However, in widely used security platforms like NVD, a substantial number of CVE records still lack trace links to patches. Although rank-based approaches have been proposed for security patch tracing, they heavily rely on handcrafted features in a single-step framework, which limits their effectiveness. In this paper, we propose PatchFinder, a two-phase framework with end-to-end correlation learning for better-tracing security patches. In the **initial retrieval** phase, we employ a hybrid patch retriever to account for both lexical and semantic matching based on the code changes and the description of a CVE, to narrow down the search space by extracting those commits as candidates that are similar to the CVE descriptions. Afterwards, in the **re-ranking** phase, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
