Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies

TL;DR
This paper introduces Dazzle, an optimized GAN-based approach using Bayesian optimization to generate synthetic minority class samples, significantly improving class imbalance handling in software security datasets for vulnerability prediction.
Contribution
Dazzle is a novel, optimized cWGAN-GP model that effectively addresses class imbalance in security datasets through hyperparameter tuning with Bayesian optimization.
Findings
Dazzle outperforms SMOTE with about 60% higher recall.
Dazzle is practical and effective across multiple security datasets.
Optimized GANs are a promising alternative for imbalance issues in security data.
Abstract
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets. Method: We introduce an approach called Dazzle which is an optimized version of conditional Wasserstein Generative Adversarial Networks with gradient penalty (cWGAN-GP). Dazzle explores the architecture hyperparameters of cWGAN-GP with a novel optimizer called Bayesian Optimization. We use Dazzle to generate minority class samples to resample the original imbalanced training dataset. Results: We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research
MethodsSynthetic Minority Over-sampling Technique.
