DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models
Arash Dargahi Nobari, Davood Rafiei

TL;DR
This paper introduces DTT, a deep learning framework leveraging large language models to transform and align heterogeneous tabular data efficiently, improving joinability and data integration accuracy over existing methods.
Contribution
The paper presents a novel example-driven neural approach for table transformation that outperforms current state-of-the-art methods in accuracy and scalability, utilizing large language models.
Findings
Our framework achieves higher accuracy than existing methods.
Fine-tuned models perform comparably or better than GPT-3.
Using large language models enhances transformation performance.
Abstract
Many organizations rely on data from government and third-party sources, and those sources rarely follow the same data formatting. This introduces challenges in integrating data from multiple sources or aligning external sources with internal databases. Commercial database systems do not offer adequate support for integrating data from heterogeneous sources, and manual integration is both time-consuming and inefficient. State-of-the-art data integration approaches that rely on similarity functions and textual transformations often fail to handle challenging cases where multiple mappings are required, or the mappings go beyond simple textual transformations. In this paper, we study the potentials of deep neural models for transforming tables for joinability. In particular, we cast the problem as a prediction task and develop a framework that leverages large deep-learning language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Web Data Mining and Analysis · Advanced Database Systems and Queries
