Using Fuzzy Matching of Queries to optimize Database workloads
Sweta Singh, Vaibhav Kulkarni, Mario Briggs, Deepak Mahajan, Eitan, Farchi

TL;DR
This paper introduces a hybrid fuzzy matching method using similarity hashing to generate fingerprints of query DAGs, enabling accurate runtime predictions for database workloads.
Contribution
It presents a novel hybrid approach combining two fingerprinting methods to predict query execution times with over 80% accuracy.
Findings
Hybrid fingerprinting improves prediction accuracy.
Over 80% accuracy in runtime behavior prediction.
Effective use of similarity hashing for QDAGs.
Abstract
Directed Acyclic Graphs (DAGs) are commonly used in Databases and Big Data computational engines like Apache Spark for representing the execution plan of queries. We refer to such graphs as Query Directed Acyclic Graphs (QDAGs). This paper uses similarity hashing to arrive at a fingerprint such that the fingerprint embodies the compute requirements of the query for QDAGs. The fingerprint, thus obtained, can be used to predict the runtime behaviour of a query based on queries executed in the past having similar QDAGs. We discuss two approaches to arrive at a fingerprint, their pros and cons and how aspects of both approaches can be combined to improve the predictions. Using a hybrid approach, we demonstrate that we are able to predict runtime behaviour of a QDAG with more than 80% accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Graph Theory and Algorithms
