Private GPTs for LLM-driven testing in software development and machine learning
Jakub Jagielski, Consuelo Rojas, Markus Abel

TL;DR
This paper investigates how private GPTs can automatically generate executable test code from acceptance criteria, improving test quality and readability in software development and machine learning contexts.
Contribution
It introduces a two-step approach using Gherkin syntax that enhances the quality and readability of automatically generated test code from LLMs.
Findings
Two-step procedure yields better test code quality
Structured prompts improve output readability and adherence to best practices
Effective prompt design enhances test generation in different scenarios
Abstract
In this contribution, we examine the capability of private GPTs to automatically generate executable test code based on requirements. More specifically, we use acceptance criteria as input, formulated as part of epics, or stories, which are typically used in modern development processes. This gives product owners, or business intelligence, respectively, a way to directly produce testable criteria through the use of LLMs. We explore the quality of the so-produced tests in two ways: i) directly by letting the LLM generate code from requirements, ii) through an intermediate step using Gherkin syntax. As a result, it turns out that the two-step procedure yields better results -where we define better in terms of human readability and best coding practices, i.e. lines of code and use of additional libraries typically used in testing. Concretely, we evaluate prompt effectiveness across two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Open Source Software Innovations · Software Engineering Research
