How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization
Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies

TL;DR
This paper introduces Swift, a dual hyperparameter optimizer that significantly improves the recall of security bug report classification by jointly optimizing data preprocessing and learning algorithms, outperforming previous methods.
Contribution
The paper presents Swift, a novel dual optimizer that effectively enhances security bug report detection by optimizing both pre-processing and learning parameters simultaneously.
Findings
Swift achieves median recall of 77.4% on Chromium data, compared to 15.7% by FARSEC.
In Ambari data, Swift's median recall improves from 21.5% to 85.7%.
Recall improvements come with moderate false positive rate increases from 8% to 24%.
Abstract
Background: In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort. Goal: The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports. Method: Our proposed method, called Swift, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, Swift uses a technique called epsilon-dominance that learns how to avoid operations that do not significantly improve performance. Result: When compared to recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Engineering Research · Software Testing and Debugging Techniques
