On Order-independent Semantics of the Similarity Group-By Relational Database Operator
Mingjie Tang, Ruby Y. Tahboub, Walid G. Aref, Qutaibah M. Malluhi,, Mourad Ouzzani

TL;DR
This paper studies the semantics of similarity group-by operators in relational databases, defining order-independent variants, proving their properties, and introducing a new operator, SGB-All, for multi-dimensional data grouping.
Contribution
It introduces the concept of order-independent similarity group-by operators, proves their existence for certain cases, and presents the SGB-All operator for multi-dimensional data clustering.
Findings
Order-independent SGB operators exist for certain semantics.
SGB-All groups data into cliques based on similarity thresholds.
Some SGB operators are ill-defined and should not be used in SQL extensions.
Abstract
Similarity group-by (SGB, for short) has been proposed as a relational database operator to match the needs of emerging database applications. Many SGB operators that extend SQL have been proposed in the literature, e.g., similarity operators in the one-dimensional space. These operators have various semantics. Depending on how these operators are implemented, some of the implementations may lead to different groupings of the data. Hence, if SQL code is ported from one database system to another, it is not guaranteed that the code will produce the same results. In this paper, we investigate the various semantics for the relational similarity group-by operators in the multi-dimensional space. We define the class of order-independent SGB operators that produce the same results regardless of the order in which the input data is presented to them. Using the notion of interval graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Semantic Web and Ontologies
