Technique to Baseline QE Artefact Generation Aligned to Quality Metrics
Eitan Farchi, Kiran Nayak, Papia Ghosh Majumdar, Saritha Route

TL;DR
This paper introduces a systematic technique for generating and evaluating quality engineering artefacts using large language models, combining reverse generation and iterative refinement guided by metrics to ensure high standards.
Contribution
It presents a novel framework that leverages LLMs, reverse generation, and rubrics for scalable, reliable QE artefact validation and quality assessment.
Findings
Reverse-generated artefacts outperform low-quality inputs.
High-quality inputs maintain standards through the process.
Framework enables scalable, reliable validation of QE artefacts.
Abstract
Large Language Models (LLMs) are transforming Quality Engineering (QE) by automating the generation of artefacts such as requirements, test cases, and Behavior Driven Development (BDD) scenarios. However, ensuring the quality of these outputs remains a challenge. This paper presents a systematic technique to baseline and evaluate QE artefacts using quantifiable metrics. The approach combines LLM-driven generation, reverse generation , and iterative refinement guided by rubrics technique for clarity, completeness, consistency, and testability. Experimental results across 12 projects show that reverse-generated artefacts can outperform low-quality inputs and maintain high standards when inputs are strong. The framework enables scalable, reliable QE artefact validation, bridging automation with accountability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Safety Systems Engineering in Autonomy
