Schema Matching with Large Language Models: an Experimental Study

Marcel Parciak; Brecht Vandevoort; Frank Neven; Liesbet M. Peeters,; Stijn Vansummeren

arXiv:2407.11852·cs.DB·July 17, 2024·2 cites

Schema Matching with Large Language Models: an Experimental Study

Marcel Parciak, Brecht Vandevoort, Frank Neven, Liesbet M. Peeters,, Stijn Vansummeren

PDF

Open Access 1 Repo

TL;DR

This study evaluates the effectiveness of large language models in schema matching tasks, demonstrating their potential to assist data engineers by identifying semantic correspondences using only schema names and descriptions.

Contribution

The paper introduces a benchmark and various prompting strategies for LLM-based schema matching, analyzing their performance and limitations in a health domain dataset.

Findings

01

LLMs' matching quality is sensitive to context scope

02

Newer LLM versions improve decisiveness

03

Certain prompting strategies balance verification effort and matching success

Abstract

Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uhasselt-dsi-data-systems-lab/code-schema-matching-llms-artefacs
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling