Experience: Quality Benchmarking of Datasets Used in Software Effort   Estimation

Michael F. Bosu; Stephen G. MacDonell

arXiv:2012.10836·cs.SE·December 22, 2020

Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation

Michael F. Bosu, Stephen G. MacDonell

PDF

1 Repo

TL;DR

This paper evaluates the quality of 13 datasets used in software effort estimation research, highlighting data quality issues and proposing a benchmarking template to improve dataset utility and data collection practices.

Contribution

It provides a systematic assessment of dataset quality in ESE and introduces a benchmarking template to enhance data collection and evaluation.

Findings

01

Identified prevalent data quality issues in commonly used datasets.

02

Assessed the fitness for purpose of these datasets.

03

Proposed a benchmarking template for dataset evaluation.

Abstract

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location and severity of defects in code. Serious questions have been raised, however, over the quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have been noted as being especially prevalent. Other quality issues, although also potentially important, have received less attention. In this study, we assess the quality of 13 datasets that have been used extensively in research on software effort estimation. The quality issues considered in this article draw on a taxonomy that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions are as follows: (1) an evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robotics-4-all/ISSEL-Announcements
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.