Evaluating prediction systems in software project estimation
Martin Shepperd, Stephen G. MacDonell

TL;DR
This paper introduces a new evaluation framework for prediction systems in software project estimation, aiming to resolve conflicting results and improve the reliability of validation studies.
Contribution
A novel, unbiased evaluation framework based on Standardised Accuracy, likelihood testing, and effect sizes, enhancing the robustness of empirical assessments.
Findings
Re-examination of previous studies shows original conclusions are unsafe.
Even the best results have only medium effect sizes compared to random guessing.
Biased accuracy metrics like MMRE are deprecated in favor of the new framework.
Abstract
Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random 'predictions', that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
