VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones
Tan Bui, Yan Naing Tun, Thanh Phuc Nguyen, Yindu Su, Ferdian Thung, Yikun Li, Han Wei Ang, Yide Yin, Frank Liauw, Lwin Khin Shar, Eng Lieh Ouh, Ting Zhang, David Lo

TL;DR
VulCoCo is a scalable method combining embedding retrieval and LLM validation to detect vulnerable code clones, outperforming existing tools and contributing to real-world vulnerability discovery.
Contribution
It introduces VulCoCo, a novel approach that integrates embedding-based retrieval with LLM validation for effective vulnerable code clone detection.
Findings
VulCoCo outperforms prior methods in Precision@k and MAP.
Successfully submitted 400 PRs, with 75 merged and 15 leading to CVEs.
Constructed a synthetic benchmark for reproducible evaluation.
Abstract
Code reuse is common in modern software development, but it can also spread vulnerabilities when developers unknowingly copy risky code. The code fragments that preserve the logic of known vulnerabilities are known as vulnerable code clones (VCCs). Detecting those VCCs is a critical but challenging task. Existing VCC detection tools often rely on syntactic similarity or produce coarse vulnerability predictions without clear explanations, limiting their practical utility. In this paper, we propose VulCoCo, a lightweight and scalable approach that combines embedding-based retrieval with large language model (LLM) validation. Starting from a set of known vulnerable functions, we retrieve syntactically or semantically similar candidate functions from a large corpus and use an LLM to assess whether the candidates retain the vulnerability. Given that there is a lack of reproducible vulnerable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques
