CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics

Yikun Li; Ting Zhang; Ratnadira Widyasari; Yan Naing Tun; Huu Hung Nguyen; Tan Bui; Ivana Clairine Irsan; Yiran Cheng; Xiang Lan; Han Wei Ang; Frank Liauw; Martin Weyssow; Hong Jin Kang; Eng Lieh Ouh; Lwin Khin Shar; David Lo

arXiv:2411.17274·cs.SE·September 12, 2025·2 cites

CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics

Yikun Li, Ting Zhang, Ratnadira Widyasari, Yan Naing Tun, Huu Hung Nguyen, Tan Bui, Ivana Clairine Irsan, Yiran Cheng, Xiang Lan, Han Wei Ang, Frank Liauw, Martin Weyssow, Hong Jin Kang, Eng Lieh Ouh, Lwin Khin Shar, David Lo

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper presents CleanVul, a novel approach using Large Language Models with heuristics to automatically identify vulnerability-fixing code changes, significantly improving dataset quality for vulnerability detection.

Contribution

It introduces a new LLM-based heuristic method for filtering vulnerability-related commits, creating a high-quality dataset that enhances model training and generalization.

Findings

01

Achieved an F1-score of 0.82 in identifying vulnerability fixes.

02

Created a dataset of 8,198 functions with 90.6% correctness.

03

Fine-tuning LLMs on CleanVul improves accuracy and generalization.

Abstract

Accurate identification of software vulnerabilities is crucial for system integrity. Vulnerability datasets, often derived from the National Vulnerability Database (NVD) or directly from GitHub, are essential for training machine learning models to detect these security flaws. However, these datasets frequently suffer from significant noise, typically 40% to 75%, due primarily to the automatic and indiscriminate labeling of all changes in vulnerability-fixing commits (VFCs) as vulnerability-related. This misclassification occurs because not all changes in a commit aimed at fixing vulnerabilities pertain to security threats; many are routine updates like bug fixes or test improvements. This paper introduces the first methodology that uses the Large Language Model (LLM) with a heuristic enhancement to automatically identify vulnerability-fixing changes from VFCs, achieving an F1-score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yikun-li/cleanvul
noneOfficial

Datasets

yikun-li/CleanVul
dataset· 157 dl
157 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Software Testing and Debugging Techniques · Security and Verification in Computing