Learning best K analogies from data distribution for case-based software effort estimation
Mohammad Azzeh, Yousef Elsheikh

TL;DR
This paper introduces a data-driven approach using bisecting k-medoids clustering to optimize case selection in case-based software effort estimation, improving accuracy over traditional methods.
Contribution
It proposes a novel clustering-based technique to automatically determine the best number of cases for each project, enhancing CBR performance.
Findings
Improved estimation accuracy compared to regular K-based CBR methods.
Effective dataset understanding aids in selecting relevant cases.
Automatic case selection reduces reliance on manual configuration.
Abstract
Case-Based Reasoning (CBR) has been widely used to generate good software effort estimates. The predictive performance of CBR is a dataset dependent and subject to extremely large space of configuration possibilities. Regardless of the type of adaptation technique, deciding on the optimal number of similar cases to be used before applying CBR is a key challenge. In this paper we propose a new technique based on Bisecting k-medoids clustering algorithm to better understanding the structure of a dataset and discovering the the optimal cases for each individual project by excluding irrelevant cases. Results obtained showed that understanding of the data characteristic prior prediction stage can help in automatically finding the best number of cases for each test project. Performance figures of the proposed estimation method are better than those of other regular K-based CBR methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Reliability and Analysis Research
