SANTOS: Relationship-based Semantic Table Union Search
Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang, Gatterbauer, Ren\'ee J. Miller, Mirek Riedewald

TL;DR
This paper introduces SANTOS, a new semantic relationship-based approach for unionable table search that leverages knowledge bases and data-driven synthesis to improve accuracy over traditional schema or column-based methods.
Contribution
The work presents a novel unionability notion considering semantic relationships between columns, along with two methods to discover these relationships using knowledge bases and data synthesis.
Findings
SANTOS outperforms existing column-based union search methods.
Synthesized knowledge bases improve union search accuracy.
Data-driven relationship discovery enhances unionability detection.
Abstract
Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way. To do so, we present two new methods to discover semantic relationship between pairs of columns. The first uses an existing knowledge base (KB), the second (which we call a "synthesized KB") uses knowledge from the data lake itself. We adopt an existing Table Union Search benchmark and present new (open) benchmarks that represent small and large real data lakes. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
