Automatic Error Classification and Root Cause Determination while Replaying Recorded Workload Data at SAP HANA
Neetha Jambigi, Thomas Bach, Felix Schabernack, Michael Felderer

TL;DR
This paper presents a machine learning approach to classify and determine root causes of alerts during workload replays in SAP HANA, reducing manual effort and improving quality assurance accuracy.
Contribution
It introduces a novel ML-based method for root cause attribution and false positive classification in workload replay analysis for SAP HANA.
Findings
Significantly reduces manual analysis effort.
Improves accuracy of alert classification.
Identifies practical limitations for future research.
Abstract
Capturing customer workloads of database systems to replay these workloads during internal testing can be beneficial for software quality assurance. However, we experienced that such replays can produce a large amount of false positive alerts that make the results unreliable or time consuming to analyze. Therefore, we design a machine learning based approach that attributes root causes to the alerts. This provides several benefits for quality assurance and allows for example to classify whether an alert is true positive or false positive. Our approach considerably reduces manual effort and improves the overall quality assurance for the database system SAP HANA. We discuss the problem, the design and result of our approach, and we present practical limitations that may require further research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Data Quality and Management · Software Engineering Research
