End-to-End Entity Resolution for Big Data: A Survey
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George, Papadakis, Kostas Stefanidis

TL;DR
This survey provides a comprehensive overview of modern end-to-end entity resolution techniques tailored for Big Data, highlighting workflows, indexing, matching methods, and open research challenges across multiple disciplines.
Contribution
It offers the first integrated view of ER workflows and novel methods addressing multiple Big Data characteristics simultaneously, unifying insights from database, semantic Web, and machine learning communities.
Findings
Highlights the importance of indexing and matching in ER for Big Data.
Identifies key challenges and open research directions in ER workflows.
Synthesizes approaches across different disciplines for handling large-scale, diverse data.
Abstract
One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Semantic Web and Ontologies
