Summarizing Large Query Logs in Ettu
Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola,, Oliver Kennedy, Shambhu Upadhyaya

TL;DR
This paper introduces a novel method for summarizing large SQL query logs by using a graph-based similarity metric and clustering, enabling scalable visualization for security and performance analysis.
Contribution
It presents a new approach combining Weisfeiler-Lehman graph isomorphism with SQL query analysis for effective log summarization and visualization.
Findings
The distance metric captures meaningful query similarity.
The summarization process is scalable and performs well.
Visualizations can be generated interactively.
Abstract
Database access logs are large, unwieldy, and hard for humans to inspect and summarize. In spite of this, they remain the canonical go-to resource for tasks ranging from performance tuning to security auditing. In this paper, we address the challenge of compactly encoding large sequences of SQL queries for presentation to a human user. Our approach is based on the Weisfeiler-Lehman (WL) approximate graph isomorphism algorithm, which identifies salient features of a graph or in our case of an abstract syntax tree. Our generalization of WL allows us to define a distance metric for SQL queries, which in turn permits automated clustering of queries. We also present two techniques for visualizing query clusters, and an algorithm that allows these visualizations to be constructed at interactive speeds. Finally, we evaluate our algorithms in the context of a motivating example: insider threat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Complex Network Analysis Techniques
