Identifying Similar Test Cases That Are Specified in Natural Language
Markos Viggiato, Dale Paas, Chris Buzon, Cor-Paul Bezemer

TL;DR
This paper presents an unsupervised method combining text embedding, similarity, and clustering to identify similar natural language test cases, reducing manual effort and improving test suite management.
Contribution
It introduces a novel unsupervised approach using multiple text embedding and clustering techniques to detect similar test cases in natural language.
Findings
Achieved an F-score of 87.39% for clustering test steps.
Achieved an F-score of 83.47% for identifying similar test cases.
Validated effectiveness in an industrial setting.
Abstract
Software testing is still a manual process in many industries, despite the recent improvements in automated testing techniques. As a result, test cases are often specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the (already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text similarity metrics, and two clustering techniques to cluster similar test steps and four techniques to identify similar test cases from the test step clusters. Through an evaluation in an industrial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research
