Can LLMs Generate User Stories and Assess Their Quality?
Giovanni Quattrocchi, Liliana Pasquale, Paola Spoletini, Luciano Baresi

TL;DR
This paper investigates the capability of large language models to generate and evaluate user stories in requirements engineering, comparing their output to human-generated stories and assessing their potential to automate quality evaluation.
Contribution
It demonstrates that LLMs can generate user stories comparable to humans and reliably assess their semantic quality, offering a promising automation approach in requirements elicitation.
Findings
LLMs produce user stories similar to humans in coverage and style.
LLMs show lower diversity and creativity in generated stories.
LLMs can effectively evaluate semantic quality with clear criteria.
Abstract
Requirements elicitation is still one of the most challenging activities of the requirements engineering process due to the difficulty requirements analysts face in understanding and translating complex needs into concrete requirements. In addition, specifying high-quality requirements is crucial, as it can directly impact the quality of the software to be developed. Although automated tools allow for assessing the syntactic quality of requirements, evaluating semantic metrics (e.g., language clarity, internal consistency) remains a manual and time-consuming activity. This paper explores how LLMs can help automate requirements elicitation within agile frameworks, where requirements are defined as user stories (US). We used 10 state-of-the-art LLMs to investigate their ability to generate US automatically by emulating customer interviews. We evaluated the quality of US generated by LLMs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
