On Enhancing Root Cause Analysis with SQL Summaries for Failures in Database Workload Replays at SAP HANA
Neetha Jambigi, Joshua Hammesfahr, Moritz Mueller, Thomas Bach,, Michael Felderer

TL;DR
This paper presents a machine learning framework enhanced with large language models to improve root cause analysis of database workload failures, providing more accurate classification and insightful summaries for better regression testing in SAP HANA.
Contribution
It introduces the use of LLMs to generate SQL failure summaries, improving the robustness and accuracy of root cause classification in database workload replays.
Findings
F1-Macro score increased by 4.77% with LLM integration
Enhanced failure summaries aid in better understanding root causes
Framework addresses generalizability challenges in ML-based failure analysis
Abstract
Capturing the workload of a database and replaying this workload for a new version of the database can be an effective approach for regression testing. However, false positive errors caused by many factors such as data privacy limitations, time dependency or non-determinism in multi-threaded environment can negatively impact the effectiveness. Therefore, we employ a machine learning based framework to automate the root cause analysis of failures found during replays. However, handling unseen novel issues not found in the training data is one general challenge of machine learning approaches with respect to generalizability of the learned model. We describe how we continue to address this challenge for more robust long-term solutions. From our experience, retraining with new failures is inadequate due to features overlapping across distinct root causes. Hence, we leverage a large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Service-Oriented Architecture and Web Services
