PatchFinder: A Two-Phase Approach to Security Patch Tracing for   Disclosed Vulnerabilities in Open-Source Software

Kaixuan Li; Jian Zhang; Sen Chen; Han Liu; Yang Liu; Yixiang Chen

arXiv:2407.17065·cs.SE·July 25, 2024

PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source Software

Kaixuan Li, Jian Zhang, Sen Chen, Han Liu, Yang Liu, Yixiang Chen

PDF

TL;DR

PatchFinder is a novel two-phase framework that improves security patch tracing for open-source software vulnerabilities by combining lexical and semantic matching with end-to-end correlation learning, significantly enhancing accuracy and reducing manual effort.

Contribution

The paper introduces PatchFinder, a two-phase end-to-end framework that combines hybrid retrieval and supervised re-ranking for more effective security patch tracing in OSS.

Findings

01

Achieves 80.63% Recall@10 and 0.7951 MRR on 4,789 CVEs.

02

Reduces manual effort by 1.94 times compared to existing methods.

03

Successfully identified and confirmed 482 patches in practice.

Abstract

Open-source software (OSS) vulnerabilities are increasingly prevalent, emphasizing the importance of security patches. However, in widely used security platforms like NVD, a substantial number of CVE records still lack trace links to patches. Although rank-based approaches have been proposed for security patch tracing, they heavily rely on handcrafted features in a single-step framework, which limits their effectiveness. In this paper, we propose PatchFinder, a two-phase framework with end-to-end correlation learning for better-tracing security patches. In the **initial retrieval** phase, we employ a hybrid patch retriever to account for both lexical and semantic matching based on the code changes and the description of a CVE, to narrow down the search space by extracting those commits as candidates that are similar to the CVE descriptions. Afterwards, in the **re-ranking** phase, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.