Prompt-Matcher: Leveraging Large Models to Reduce Uncertainty in Schema   Matching Results

Longyu Feng; Huahang Li; Chen Jason Zhang

arXiv:2408.14507·cs.DB·March 7, 2025

Prompt-Matcher: Leveraging Large Models to Reduce Uncertainty in Schema Matching Results

Longyu Feng, Huahang Li, Chen Jason Zhang

PDF

Open Access

TL;DR

This paper introduces Prompt-Matcher, a novel method using large language models to verify schema correspondences iteratively, reducing uncertainty and improving accuracy in schema matching tasks across diverse datasets.

Contribution

It proposes a new iterative approach leveraging GPT-4 with prompt templates for uncertainty reduction in schema matching, including a novel approximation algorithm for correspondence selection.

Findings

01

Outperforms brute-force algorithms in efficiency.

02

Achieves state-of-the-art verification accuracy with GPT-4.

03

Demonstrates robustness across benchmark datasets.

Abstract

Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. For datasets across different scenarios, the optimal schema matching algorithm is different. For single algorithm, hyperparameter tuning also cases multiple results. All results assigned equal probabilities are stored in probabilistic databases to facilitate uncertainty management. The substantial degree of uncertainty diminishes the efficiency and reliability of data processing, thereby precluding the provision of more accurate information for decision-makers. To address this problem, we introduce a new approach based on fine-grained correspondence verification with specific prompt of Large Language Model. Our approach is an iterative loop that consists of three main components: (1) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Healthcare

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding