"What makes my queries slow?": Subgroup Discovery for SQL Workload Analysis
Youcef Remil, Anes Bendimerad, Romain Mathonat, Philippe Chaleat,, Mehdi Kaytoue

TL;DR
This paper introduces a subgroup discovery approach to analyze SQL workloads, helping database administrators identify patterns and root causes of performance issues through an explainable AI framework and visualization tools.
Contribution
It presents a novel application of subgroup discovery for SQL workload analysis, including a visualization tool and empirical validation on real-world data.
Findings
Insightful hypotheses about workload issues can be discovered.
The approach effectively identifies patterns related to query performance.
The dataset and source code are publicly available for further research.
Abstract
Among daily tasks of database administrators (DBAs), the analysis of query workloads to identify schema issues and improving performances is crucial. Although DBAs can easily pinpoint queries repeatedly causing performance issues, it remains challenging to automatically identify subsets of queries that share some properties only (a pattern) and simultaneously foster some target measures, such as execution time. Patterns are defined on combinations of query clauses, environment variables, database alerts and metrics and help answer questions like what makes SQL queries slow? What makes I/O communications high? Automatically discovering these patterns in a huge search space and providing them as hypotheses for helping to localize issues and root-causes is important in the context of explainable AI. To tackle it, we introduce an original approach rooted on Subgroup Discovery. We show how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Software System Performance and Reliability
