Selectivity Estimation of Inequality Joins In Databases

Diogo Repas; Zhicheng Luo; Maxime Schoemans; Mahmoud Sakr

arXiv:2206.07396·cs.DB·June 16, 2022

Selectivity Estimation of Inequality Joins In Databases

Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr

PDF

Open Access

TL;DR

This paper addresses the lack of selectivity estimation methods for inequality joins in databases, proposing a new algorithm implemented in PostgreSQL to improve query optimization accuracy.

Contribution

It introduces a novel selectivity estimation algorithm for inequality joins, filling a gap in existing database systems and providing an implementation in PostgreSQL.

Findings

01

PostgreSQL and MySQL lack inequality join selectivity estimation.

02

Oracle and SQL-Server provide fairly accurate estimations but keep their algorithms secret.

03

The proposed algorithm improves selectivity estimation accuracy for inequality joins.

Abstract

Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While the problem is known since the mid 70s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By testing four common database systems: Oracle, SQL-Server, PostgreSQL, and MySQL, we found that the open-source systems PostgreSQL and MySQL lack this estimation. Oracle and SQL-Server make fairly accurate estimations, yet their algorithms are secret. This paper thus proposes an algorithm for inequality join selectivity estimation. The proposed algorithm has been implemented in PostgreSQL and sent as a patch to be included in the next releases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Cloud Computing and Resource Management