# 80 New Packages to Mine Database Query Logs

**Authors:** Thibault Sellam, Martin Kersten

arXiv: 1703.08732 · 2017-03-28

## TL;DR

This paper introduces three novel methods—feature maps, kernel methods, and Bayesian models—to treat SQL queries as vectors, enabling advanced statistical analysis of database query logs for improved data mining.

## Contribution

It presents new techniques to vectorize SQL queries, overcoming previous limitations and expanding the statistical tools available for mining query logs.

## Key findings

- Feature maps directly encode queries into vectors
- Kernel methods transform queries implicitly for analysis
- Bayesian models use probabilistic graphical models for query analysis

## Abstract

The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symbolic objects, not vectors of numbers. Therefore, many popular statistical concepts, such as means, regression, or decision trees do not apply. Most authors limit themselves to ad hoc algorithms or approaches based on neighborhoods, such as k Nearest Neighbors. Our project is to challenge this limitation. We introduce methods to manipulate SQL queries as if they were vectors, thereby unlocking the whole statistical toolbox. We present three families of methods: feature maps, kernel methods, and Bayesian models. The first technique directly encodes queries into vectors. The second one transforms the queries implicitly. The last one exploits probabilistic graphical models as an alternative to vector spaces. We present the benefits and drawbacks of each solution, highlight how they relate to each other, and make the case for future investigation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.08732/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1703.08732/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1703.08732/full.md

---
Source: https://tomesphere.com/paper/1703.08732