GXJoin: Generalized Cell Transformations for Explainable Joinability

Soroush Omidvartehrani; Arash Dargahi Nobari; and Davood Rafiei

arXiv:2505.21860·cs.DB·May 29, 2025

GXJoin: Generalized Cell Transformations for Explainable Joinability

Soroush Omidvartehrani, Arash Dargahi Nobari, and Davood Rafiei

PDF

1 Repo

TL;DR

This paper introduces GXJoin, a method for discovering generalized, explainable data transformations that improve joinability across diverse sources, enhancing data integration with simpler, more effective transformations.

Contribution

Proposes a novel approach for identifying generalized transformations that enhance joinability, focusing on coverage and explainability across various datasets and domains.

Findings

01

Outperforms state-of-the-art methods in coverage and simplicity.

02

Generates fewer, more explainable transformations.

03

Improves join performance significantly.

Abstract

Describing real-world entities can vary across different sources, posing a challenge when integrating or exchanging data. We study the problem of joinability under syntactic transformations, where two columns are not equi-joinable but can become equi-joinable after some transformations. Discovering those transformations is a challenge because of the large space of possible candidates, which grows with the input length and the number of rows. Our focus is on the generality of transformations, aiming to make the relevant models applicable across various instances and domains. We explore a few generalization techniques, emphasizing those that yield transformations covering a larger number of rows and are often easier to explain. Through extensive evaluation on two real-world datasets and employing diverse metrics for measuring the coverage and simplicity of the transformations, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soroushomidvar/GXJoin
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.