The Vocabulary of Flaky Tests in the Context of SAP HANA
Alexander Berndt, Zolt\'an Nochta, Thomas Bach

TL;DR
This study evaluates methods to identify flaky tests in SAP HANA using source code identifiers, comparing different feature extraction and classification techniques, and finds high accuracy but limited practical usefulness.
Contribution
It replicates previous flaky test identification approaches in an industrial setting and assesses new feature extraction and classification methods for improved accuracy.
Findings
High F1-Scores achieved with new methods.
Vocabulary categories are similar to previous findings.
Limited practical usefulness due to non-actionable results.
Abstract
Background. Automated test execution is an important activity to gather information about the quality of a software project. So-called flaky tests, however, negatively affect this process. Such tests fail seemingly at random without changes to the code and thus do not provide a clear signal. Previous work proposed to identify flaky tests based on the source code identifiers in the test code. So far, these approaches have not been evaluated in a large-scale industrial setting. Aims. We evaluate approaches to identify flaky tests and their root causes based on source code identifiers in the test code in a large-scale industrial project. Method. First, we replicate previous work by Pinto et al. in the context of SAP HANA. Second, we assess different feature extraction techniques, namely TF-IDF and TF-IDFC-RF. Third, we evaluate CodeBERT and XGBoost as classification models. For a sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices
