GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security
Xuanqing Liu, Luyang Kong, Runhui Wang, Patrick Song, Austin Nevins,, Henrik Johnson, Nimish Amlathe, Davor Golac

TL;DR
This paper introduces GRAM, a method leveraging large language models for schema matching in data security contexts, focusing on zero-shot and few-shot scenarios to protect customer privacy while maintaining matching accuracy.
Contribution
It presents a novel approach using large language models for schema matching under privacy-preserving zero-shot and few-shot conditions, addressing a critical security challenge.
Findings
Effective schema matching with minimal data exposure
Maintains high accuracy in zero-shot and few-shot scenarios
Addresses privacy concerns in data integration
Abstract
Schema matching constitutes a pivotal phase in the data ingestion process for contemporary database systems. Its objective is to discern pairwise similarities between two sets of attributes, each associated with a distinct data table. This challenge emerges at the initial stages of data analytics, such as when incorporating a third-party table into existing databases to inform business insights. Given its significance in the realm of database systems, schema matching has been under investigation since the 2000s. This study revisits this foundational problem within the context of large language models. Adhering to increasingly stringent data security policies, our focus lies on the zero-shot and few-shot scenarios: the model should analyze only a minimal amount of customer data to execute the matching task, contrasting with the conventional approach of scrutinizing the entire data table.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Privacy-Preserving Technologies in Data
MethodsFocus
