Why Did My Query Slow Down?
Nedyalko Borisov, Shivnath Babu, Sandeep Uttamchandani, Ramani, Routray, Aameek Singh

TL;DR
This paper introduces DIADS, a diagnosis framework that combines domain knowledge and machine learning to identify causes of database query slowdowns involving complex interactions with SAN infrastructure.
Contribution
The paper presents DIADS, an integrated diagnosis tool that uses Annotated Plan Graphs and machine learning to diagnose query slowdowns in complex database and SAN environments.
Findings
Successfully diagnosed query slowdowns in PostgreSQL and SAN environments.
Effectively handled noisy monitoring data and complex cause interactions.
Demonstrated practical utility in enterprise settings.
Abstract
Many enterprise environments have databases running on network-attached server-storage infrastructure (referred to as Storage Area Networks or SANs). Both the database and the SAN are complex systems that need their own separate administrative teams. This paper puts forth the vision of an innovative management framework to simplify administrative tasks that require an in-depth understanding of both the database and the SAN. As a concrete instance, we consider the task of diagnosing the slowdown in performance of a database query that is executed multiple times (e.g., in a periodic report-generation setting). This task is very challenging because the space of possible causes includes problems specific to the database, problems specific to the SAN, and problems that arise due to interactions between the two systems. In addition, the monitoring data available from these systems can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Distributed systems and fault tolerance
