Novel Table Search [Technical Report]
Besat Kassaie, Ren\'ee J. Miller

TL;DR
This paper introduces Novel Table Search (NTS), a formal framework for discovering unionable tables that add new information in data lakes, proposing an efficient approximation method called ANTs that outperforms existing approaches.
Contribution
It formalizes NTS, develops a scoring mechanism for syntactic novelty, proves NP-hardness, and proposes ANTs, an efficient approximation technique with superior performance.
Findings
ANTs outperforms other methods in capturing syntactic novelty
ANTs achieves the lowest execution time among compared methods
The proposed scoring mechanism satisfies key properties for NTS
Abstract
Avoiding redundancy in query results has been extensively studied in relational databases and information retrieval, yet its implications for data lakes remain largely unexplored. We bridge this gap by investigating how to discover unionable tables that contribute new information for a given query table in large-scale data lakes. We formally define Novel Table Search (NTS) as the problem of finding tables that are novel with respect to a given query table and identify two desirable properties that any scoring function for NTS should satisfy. We introduce a concrete scoring mechanism designed to maximize syntactic novelty, prove that it satisfies the proposed properties, and show that the associated optimization problem is NP-hard. To address this challenge, we develop an efficient approximation technique based on penalization, i.e., Attribute-Based Novel Table Search (ANTs). We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Graph Theory and Algorithms · Advanced Database Systems and Queries
