Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural Networks
Shitao Wang, Jiamin Lu

TL;DR
This paper presents EMM-CCAR, a novel entity matching model leveraging BERT and attention mechanisms to handle heterogeneous data and complex attribute relationships, outperforming existing methods in F1 score.
Contribution
The paper introduces EMM-CCAR, a new model that transforms entity matching into sequence matching and captures complex attribute relationships using attention mechanisms.
Findings
Achieves approximately 4% higher F1 score than DER-SSM.
Outperforms Ditto by about 1% in F1 score.
Effectively handles data heterogeneity and attribute interdependencies.
Abstract
Across various domains, data from different sources such as Baidu Baike and Wikipedia often manifest in distinct forms. Current entity matching methodologies predominantly focus on homogeneous data, characterized by attributes that share the same structure and concise attribute values. However, this orientation poses challenges in handling data with diverse formats. Moreover, prevailing approaches aggregate the similarity of attribute values between corresponding attributes to ascertain entity similarity. Yet, they often overlook the intricate interrelationships between attributes, where one attribute may have multiple associations. The simplistic approach of pairwise attribute comparison fails to harness the wealth of information encapsulated within entities.To address these challenges, we introduce a novel entity matching model, dubbed Entity Matching Model for Capturing Complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Artificial Intelligence in Healthcare
MethodsFocus
