Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Second Replication Study (GPT2SP Replication Report)
Vali Tawosi, Rebecca Moussa, Federica Sarro

TL;DR
This paper replicates and evaluates GPT2SP, a Transformer-based model for story point estimation, correcting a bug in the original implementation and analyzing its performance across multiple metrics and scenarios.
Contribution
It provides a corrected replication of GPT2SP's performance, offering insights into its accuracy and the impact of a previously unreported bug in the original study.
Findings
GPT2SP outperforms baseline estimators in MAE improvements.
A bug in the original MAE computation inflated reported accuracy.
Fixed version of GPT2SP yields more accurate and reliable estimates.
Abstract
Fu and Tantithamthavorn have recently proposed GPT2SP, a Transformer-based deep learning model for SP estimation of user stories. They empirically evaluated the performance of GPT2SP on a dataset shared by Choetkiertikul et al including 16 projects with a total of 23,313 issues. They benchmarked GPT2SP against two baselines (namely the naive Mean and Median estimators) and the method previously proposed by Choetkiertikul et al. (which we will refer to as DL2SP from now on) for both within- and cross-project estimation scenarios, and evaluated the extent to which each components of GPT2SP contribute towards the accuracy of the SP estimates. Their results show that GPT2SP outperforms DL2SP with a 6%-47% improvement over MAE for the within-project scenario and a 3%-46% improvement for the cross-project scenarios. However, when we attempted to use the GPT2SP source code made available by Fu…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
