ActiveClean: Generating Line-Level Vulnerability Data via Active Learning
Ashwin Kallingal Joshy, Mirza Sanjida Alam, Shaila Sharmin, Qi Li and, Wei Le

TL;DR
This paper introduces ActiveClean, an active learning approach to generate accurate line-level vulnerability data from commits, enhancing the training of vulnerability detection tools and outperforming static analysis methods.
Contribution
ActiveClean is a scalable, active learning-based tool that produces precise line-level vulnerability labels from commit data, improving vulnerability detection accuracy.
Findings
Achieved F1 scores of 70-74% on Java and C datasets.
Using only 400 training samples, reached an F1 score of 70.23%.
Improved vulnerability detection by LineVul, increasing detected vulnerable lines by 70 and functions by 18.
Abstract
Deep learning vulnerability detection tools are increasing in popularity and have been shown to be effective. These tools rely on large volume of high quality training data, which are very hard to get. Most of the currently available datasets provide function-level labels, reporting whether a function is vulnerable or not vulnerable. However, for a vulnerability detection to be useful, we need to also know the lines that are relevant to the vulnerability. This paper makes efforts towards developing systematic tools and proposes. ActiveClean to generate the large volume of line-level vulnerability data from commits. That is, in addition to function-level labels, it also reports which lines in the function are likely responsible for vulnerability detection. In the past, static analysis has been applied to clean commits to generate line-level data. Our approach based on active learning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Web Application Security Vulnerabilities
