TL;DR
This paper demonstrates that using a novel oversampling technique called fuzzy sampling within a hyper-parameter optimization pipeline significantly improves deep learning performance in software defect prediction, outperforming previous methods on multiple datasets.
Contribution
The paper introduces fuzzy sampling, a new oversampling method, combined with GHOST, a hyper-parameter optimization pipeline, to enhance deep learning effectiveness in defect prediction.
Findings
Outperforms prior deep learning methods on 14 out of 20 datasets
Enables faster training of deep learners with better accuracy
Supports the use of oversampling before deep learning in defect prediction
Abstract
One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we preprocess data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
