Flaky Tests in a Large Industrial Database Management System: An Empirical Study of Fixed Issue Reports for SAP HANA
Alexander Berndt, Thomas Bach, Sebastian Baltes

TL;DR
This study investigates flaky tests in SAP HANA, using a novel LLM-based approach to categorize root causes from issue reports, revealing concurrency issues as the most common cause in this large industrial database system.
Contribution
Introduces an LLM-based method for automatically labeling flaky test reports by root cause, enabling large-scale analysis of flakiness in industrial software.
Findings
Concurrency issues are the most prevalent cause of flaky tests in SAP HANA.
Different test types face distinct flakiness challenges.
The LLM-based labeling approach effectively categorizes root causes from issue reports.
Abstract
Flaky tests yield different results when executed multiple times for the same version of the source code. Thus, they provide an ambiguous signal about the quality of the code and interfere with the automated assessment of code changes. While a variety of factors can cause test flakiness, approaches to fix flaky tests are typically tailored to address specific causes. However, the prevalent root causes of flaky tests can vary depending on the programming language, application domain, or size of the software project. Since manually labeling flaky tests is time-consuming and tedious, this work proposes an LLMs-as-annotators approach that leverages intra- and inter-model consistency to label issue reports related to fixed flakiness issues with the relevant root cause category. This allows us to gain an overview of prevalent flakiness categories in the issue reports. We evaluated our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
