Benchmarking Blocking Algorithms for Web Entities

Vasilis Efthymiou; Kostas Stefanidis; Vassilis Christophides

arXiv:2005.09399·cs.DB·May 20, 2020

Benchmarking Blocking Algorithms for Web Entities

Vasilis Efthymiou, Kostas Stefanidis, Vassilis Christophides

PDF

1 Repo

TL;DR

This paper evaluates various blocking algorithms for Entity Resolution on web data, analyzing their effectiveness and efficiency using real datasets to improve matching accuracy and reduce comparisons.

Contribution

It introduces an experimental framework for benchmarking blocking methods tailored to Web of data entity resolution tasks.

Findings

01

Different blocking methods vary in effectiveness depending on dataset characteristics

02

The framework helps identify types of missed matches and their causes

03

Results guide the selection of suitable blocking algorithms for web data ER

Abstract

An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-processing step, called \emph{blocking}, which places similar entity descriptions into blocks and thus only compare descriptions within the same block. We experimentally evaluate several blocking methods proposed for the Web of data using real datasets, whose characteristics significantly impact their effectiveness and efficiency. The proposed experimental evaluation framework allows us to better understand the characteristics of the missed matching entity descriptions and contrast them with ground truth obtained from different kinds of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vefthym/ParallelBlocking
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.