Prompt-Matcher: Leveraging Large Models to Reduce Uncertainty in Schema Matching Results
Longyu Feng, Huahang Li, Chen Jason Zhang

TL;DR
This paper introduces Prompt-Matcher, a novel method using large language models to verify schema correspondences iteratively, reducing uncertainty and improving accuracy in schema matching tasks across diverse datasets.
Contribution
It proposes a new iterative approach leveraging GPT-4 with prompt templates for uncertainty reduction in schema matching, including a novel approximation algorithm for correspondence selection.
Findings
Outperforms brute-force algorithms in efficiency.
Achieves state-of-the-art verification accuracy with GPT-4.
Demonstrates robustness across benchmark datasets.
Abstract
Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. For datasets across different scenarios, the optimal schema matching algorithm is different. For single algorithm, hyperparameter tuning also cases multiple results. All results assigned equal probabilities are stored in probabilistic databases to facilitate uncertainty management. The substantial degree of uncertainty diminishes the efficiency and reliability of data processing, thereby precluding the provision of more accurate information for decision-makers. To address this problem, we introduce a new approach based on fine-grained correspondence verification with specific prompt of Large Language Model. Our approach is an iterative loop that consists of three main components: (1) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Healthcare
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding
