Selectivity Estimation of Inequality Joins In Databases
Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr

TL;DR
This paper addresses the lack of selectivity estimation methods for inequality joins in databases, proposing a new algorithm implemented in PostgreSQL to improve query optimization accuracy.
Contribution
It introduces a novel selectivity estimation algorithm for inequality joins, filling a gap in existing database systems and providing an implementation in PostgreSQL.
Findings
PostgreSQL and MySQL lack inequality join selectivity estimation.
Oracle and SQL-Server provide fairly accurate estimations but keep their algorithms secret.
The proposed algorithm improves selectivity estimation accuracy for inequality joins.
Abstract
Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While the problem is known since the mid 70s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By testing four common database systems: Oracle, SQL-Server, PostgreSQL, and MySQL, we found that the open-source systems PostgreSQL and MySQL lack this estimation. Oracle and SQL-Server make fairly accurate estimations, yet their algorithms are secret. This paper thus proposes an algorithm for inequality join selectivity estimation. The proposed algorithm has been implemented in PostgreSQL and sent as a patch to be included in the next releases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Cloud Computing and Resource Management
