TL;DR
This paper evaluates various blocking algorithms for Entity Resolution on web data, analyzing their effectiveness and efficiency using real datasets to improve matching accuracy and reduce comparisons.
Contribution
It introduces an experimental framework for benchmarking blocking methods tailored to Web of data entity resolution tasks.
Findings
Different blocking methods vary in effectiveness depending on dataset characteristics
The framework helps identify types of missed matches and their causes
Results guide the selection of suitable blocking algorithms for web data ER
Abstract
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-processing step, called \emph{blocking}, which places similar entity descriptions into blocks and thus only compare descriptions within the same block. We experimentally evaluate several blocking methods proposed for the Web of data using real datasets, whose characteristics significantly impact their effectiveness and efficiency. The proposed experimental evaluation framework allows us to better understand the characteristics of the missed matching entity descriptions and contrast them with ground truth obtained from different kinds of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
