Table Integration in Data Lakes Unleashed: Pairwise Integrability Judgment, Integrable Set Discovery, and Multi-Tuple Conflict Resolution
Daomin Ji, Hui Luo, Zhifeng Bao, Shane Culpepper

TL;DR
This paper presents a comprehensive framework for table integration in data lakes, including pairwise integrability judgment using self-supervised learning, integrable set discovery via community detection, and multi-tuple conflict resolution with large language models, all designed to work with limited labeled data.
Contribution
It introduces a novel self-supervised adversarial contrastive learning approach for pairwise judgment and leverages LLMs for conflict resolution, addressing data scarcity challenges in data lake integration.
Findings
Effective pairwise integrability classifier trained with self-supervised learning
Successful identification of integrable sets using community detection algorithms
LLM-based conflict resolution reduces annotation needs
Abstract
Table integration aims to create a comprehensive table by consolidating tuples containing relevant information. In this work, we investigate the challenge of integrating multiple tables from a data lake, focusing on three core tasks: 1) pairwise integrability judgment, which determines whether a tuple pair is integrable, accounting for any occurrences of semantic equivalence or typographical errors; 2) integrable set discovery, which identifies all integrable sets in a table based on pairwise integrability judgments established in the first task; 3) multi-tuple conflict resolution, which resolves conflicts between multiple tuples during integration. To this end, we train a binary classifier to address the task of pairwise integrability judgment. Given the scarcity of labeled data in data lakes, we propose a self-supervised adversarial contrastive learning algorithm to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Advanced Database Systems and Queries · Data Quality and Management
MethodsSparse Evolutionary Training · Contrastive Learning
