Nonparametric Bayesian Modeling for Automated Database Schema Matching
Erik M. Ferragut, Jason Laska

TL;DR
This paper presents a nonparametric Bayesian framework for automated database schema matching, improving accuracy and speed over existing methods by modeling fields probabilistically.
Contribution
It introduces a novel nonparametric Bayesian approach to schema matching that effectively compares field distributions for better accuracy.
Findings
More accurate than existing instance-based algorithms
Faster matching process
Effective in merging databases
Abstract
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
