ReMatch: Retrieval Enhanced Schema Matching with LLMs

Eitam Sheetrit; Menachem Brief; Moshik Mishaeli; Oren Elisha

arXiv:2403.01567·cs.DB·May 31, 2024·3 cites

ReMatch: Retrieval Enhanced Schema Matching with LLMs

Eitam Sheetrit, Menachem Brief, Moshik Mishaeli, Oren Elisha

PDF

Open Access 1 Repo

TL;DR

ReMatch leverages retrieval-enhanced LLMs to perform schema matching without training data or access to source schemas, addressing privacy and data availability issues in data integration.

Contribution

The paper introduces ReMatch, a novel schema matching method using retrieval-enhanced LLMs that requires no training data or access to source schemas, improving practicality.

Findings

01

ReMatch achieves high accuracy on real-world schemas.

02

It outperforms traditional machine learning approaches.

03

The method is effective without training or data access.

Abstract

Schema matching is a crucial task in data integration, involving the alignment of a source schema with a target schema to establish correspondence between their elements. This task is challenging due to textual and semantic heterogeneity, as well as differences in schema sizes. Although machine-learning-based solutions have been explored in numerous studies, they often suffer from low accuracy, require manual mapping of the schemas for model training, or need access to source schema data which might be unavailable due to privacy concerns. In this paper we present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs). Our method avoids the need for predefined mapping, any model training, or access to data in the source database. Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

menidata1/mimic_2_omop
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques