Behaviour Driven Development Scenario Generation with Large Language Models
Amila Rathnayake, Mojtaba Shahin, Golnoush Abaei

TL;DR
This study evaluates three large language models for automated generation of BDD scenarios, revealing model-specific strengths, the importance of input quality, and effective prompting techniques, supported by a new dataset of 500 examples.
Contribution
It introduces a comprehensive evaluation framework for LLM-based BDD scenario generation and compares three models using a novel dataset and diverse assessment methods.
Findings
GPT-4 scores highest in text similarity metrics.
Claude 3's scenarios are rated best by humans.
Input quality significantly impacts scenario quality.
Abstract
This paper presents an evaluation of three LLMs, GPT-4, Claude 3, and Gemini, for automated Behaviour-Driven Development (BDD) scenarios generation. To support this evaluation, we constructed a dataset of 500 user stories, requirement descriptions, and their corresponding BDD scenarios, drawn from four proprietary software products. We assessed the quality of BDD scenarios generated by LLMs using a multidimensional evaluation framework encompassing text and semantic similarity metrics, LLM-based evaluation, and human expert assessment. Our findings reveal that although GPT-4 achieves higher scores in text and semantic similarity metrics, Claude 3 produces scenarios rated highest by both human experts and LLM-based evaluators. LLM-based evaluators, particularly DeepSeek, show a stronger correlation with human judgment than with text similarity and semantic similarity metrics. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Advanced Software Engineering Methodologies
