Better Security Bug Report Classification via Hyperparameter Optimization
Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies

TL;DR
This paper enhances security bug report classification by applying hyperparameter optimization to data pre-processing and model parameters, significantly improving recall rates in identifying security vulnerabilities.
Contribution
It demonstrates that optimizing data pre-processing methods yields greater improvements than tuning learners alone in classifying security bug reports.
Findings
Data pre-processing optimization improves recall by up to 65%.
Hyperparameter tuning of data pre-processing is more effective than tuning learners.
Significant improvements in security bug report classification accuracy.
Abstract
When security bugs are detected, they should be (a)~discussed privately by security software engineers; and (b)~not mentioned to the general public until security patches are available. Software engineers usually report bugs to bug tracking system, and label them as security bug reports (SBRs) or not-security bug reports (NSBRs), while SBRs have a higher priority to be fixed before exploited by attackers than NSBRs. Yet suspected security bug reports are often publicly disclosed because the mislabelling issues ( i.e., mislabel security bug reports as not-security bug report). The goal of this paper is to aid software developers to better classify bug reports that identify security vulnerabilities as security bug reports through parameter tuning of learners and data pre-processor. Previous work has applied text analytics and machine learning learners to classify which reported bugs are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Machine Learning and Data Classification
