MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching
Xiaocan Zeng, Pengfei Wang, Yuren Mao, Lu Chen, Xiaoze Liu, Yunjun Gao

TL;DR
MultiEM introduces a scalable unsupervised approach for multi-table entity matching, significantly improving accuracy and efficiency over existing methods in real-world data management scenarios.
Contribution
This paper presents the first effective and efficient unsupervised multi-table entity matching method, addressing a gap in current research.
Findings
MultiEM outperforms existing methods on six benchmark datasets.
MultiEM is scalable and parallelizable for large datasets.
It achieves higher accuracy and efficiency in multi-table EM tasks.
Abstract
Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being extremely labor-intensive, unsupervised EM is more applicable than supervised EM in practical scenarios. Traditional unsupervised EM assumes that all entities come from two tables; however, it is more common to match entities from multiple tables in practical applications, that is, multi-table entity matching (multi-table EM). Unfortunately, effective and efficient unsupervised multi-table EM remains under-explored. To fill this gap, this paper formally studies the problem of unsupervised multi-table entity matching and proposes an effective and efficient solution, termed as MultiEM. MultiEM is a parallelable pipeline of enhanced entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Artificial Intelligence in Healthcare
