Story Point Estimation Using Large Language Models
Pranam Prakash Shetty, Adarsh Balakrishnan, Mengqiao Xu, Xiaoyin Xi, Zhe Yu

TL;DR
This paper explores the capability of large language models to predict story points in software projects without training data, showing they outperform traditional models in zero-shot and few-shot scenarios, and examines the role of comparative judgments.
Contribution
It demonstrates that LLMs can effectively predict story points with minimal or no training data and explores using comparative judgments as a means to enhance prediction accuracy.
Findings
LLMs outperform supervised models in zero-shot story point prediction.
Few-shot prompting further improves LLM prediction performance.
Comparative judgments can serve as effective few-shot examples for LLMs.
Abstract
This study investigates the use of large language models (LLMs) for story point estimation. Story points are unitless, project-specific effort estimates that help developers on the scrum team forecast which product backlog items they plan to complete in a sprint. To facilitate this process, machine learning models, especially deep neural networks, have been applied to predict the story points based on the title and description of each item. However, such machine learning models require sufficient amounts of training data (with ground truth story points annotated by human developers) from the same software project to achieve decent prediction performance. This motivated us to explore whether LLMs are capable of (RQ1) predicting story points without training data or (RQ2) with only a few training data points. Our empirical results with four LLMs on 16 software projects show that, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
