Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software Engineering
Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E., Hassan

TL;DR
This paper introduces a framework to evaluate how discretization noise in the dependent variable affects classifier performance and interpretation in software engineering datasets, highlighting that some features remain unaffected.
Contribution
The paper presents a systematic framework for estimating the impact of discretization noise on classifiers and provides insights into its effects on performance and feature importance.
Findings
Discretization noise impacts performance measures differently across datasets.
Classifier interpretation is generally affected by discretization noise.
Top important features are robust to discretization noise.
Abstract
Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of 7 software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
