Heterogeneity in Entity Matching: A Survey and Experimental Analysis
Mohammad Hossein Moslemi, Amir Mousavi, Behshid Behkamal, and Mostafa Milani

TL;DR
This paper surveys the challenges of heterogeneous entity matching, introduces a taxonomy for data variation types, and evaluates recent methods' robustness, highlighting limitations and future research directions.
Contribution
It provides a unified taxonomy for heterogeneity in entity matching and critically analyzes current methods' effectiveness against diverse data variations.
Findings
Current EM methods have limited robustness to semantic heterogeneity.
Persistent challenges remain in handling diverse data formats and semantics.
Future directions include multimodal matching and integration with large language models.
Abstract
Entity matching (EM) is a fundamental task in data integration and analytics, essential for identifying records that refer to the same real-world entity across diverse sources. In practice, datasets often differ widely in structure, format, schema, and semantics, creating substantial challenges for EM. We refer to this setting as Heterogeneous EM (HEM). This survey offers a unified perspective on HEM by introducing a taxonomy, grounded in prior work, that distinguishes two primary categories -- representation and semantic heterogeneity -- and their subtypes. The taxonomy provides a systematic lens for understanding how variations in data form and meaning shape the complexity of matching tasks. We then connect this framework to the FAIR principles -- Findability, Accessibility, Interoperability, and Reusability -- demonstrating how they both reveal the challenges of HEM and suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Topic Modeling
