Automatic Error Classification and Root Cause Determination while   Replaying Recorded Workload Data at SAP HANA

Neetha Jambigi; Thomas Bach; Felix Schabernack; Michael Felderer

arXiv:2205.08029·cs.SE·May 18, 2022·1 cites

Automatic Error Classification and Root Cause Determination while Replaying Recorded Workload Data at SAP HANA

Neetha Jambigi, Thomas Bach, Felix Schabernack, Michael Felderer

PDF

Open Access

TL;DR

This paper presents a machine learning approach to classify and determine root causes of alerts during workload replays in SAP HANA, reducing manual effort and improving quality assurance accuracy.

Contribution

It introduces a novel ML-based method for root cause attribution and false positive classification in workload replay analysis for SAP HANA.

Findings

01

Significantly reduces manual analysis effort.

02

Improves accuracy of alert classification.

03

Identifies practical limitations for future research.

Abstract

Capturing customer workloads of database systems to replay these workloads during internal testing can be beneficial for software quality assurance. However, we experienced that such replays can produce a large amount of false positive alerts that make the results unreliable or time consuming to analyze. Therefore, we design a machine learning based approach that attributes root causes to the alerts. This provides several benefits for quality assurance and allows for example to classify whether an alert is true positive or false positive. Our approach considerably reduces manual effort and improves the overall quality assurance for the database system SAP HANA. We discuss the problem, the design and result of our approach, and we present practical limitations that may require further research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Data Quality and Management · Software Engineering Research