On the Value of Oversampling for Deep Learning in Software Defect   Prediction

Rahul Yedida; Tim Menzies

arXiv:2008.03835·cs.SE·April 22, 2021

On the Value of Oversampling for Deep Learning in Software Defect Prediction

Rahul Yedida, Tim Menzies

PDF

1 Repo

TL;DR

This paper demonstrates that using a novel oversampling technique called fuzzy sampling within a hyper-parameter optimization pipeline significantly improves deep learning performance in software defect prediction, outperforming previous methods on multiple datasets.

Contribution

The paper introduces fuzzy sampling, a new oversampling method, combined with GHOST, a hyper-parameter optimization pipeline, to enhance deep learning effectiveness in defect prediction.

Findings

01

Outperforms prior deep learning methods on 14 out of 20 datasets

02

Enables faster training of deep learners with better accuracy

03

Supports the use of oversampling before deep learning in defect prediction

Abstract

One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we preprocess data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yrahul3910/ghost-dl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.