Evaluating Blocking Biases in Entity Matching

Mohammad Hossein Moslemi; Harini Balamurugan; Mostafa Milani

arXiv:2409.16410·cs.LG·September 26, 2024

Evaluating Blocking Biases in Entity Matching

Mohammad Hossein Moslemi, Harini Balamurugan, Mostafa Milani

PDF

Open Access 1 Repo

TL;DR

This paper examines the fairness of blocking techniques in Entity Matching, extending metrics to assess bias, and evaluates various methods to promote equitable data integration outcomes.

Contribution

It introduces a fairness-aware framework for evaluating blocking methods in Entity Matching, addressing an overlooked bias issue in data integration.

Findings

01

Certain blocking methods exhibit demographic biases

02

Fairness-aware blocking improves equitable data matching

03

Traditional metrics overlook bias in blocking techniques

Abstract

Entity Matching (EM) is crucial for identifying equivalent data entities across different sources, a task that becomes increasingly challenging with the growth and heterogeneity of data. Blocking techniques, which reduce the computational complexity of EM, play a vital role in making this process scalable. Despite advancements in blocking methods, the issue of fairness; where blocking may inadvertently favor certain demographic groups; has been largely overlooked. This study extends traditional blocking metrics to incorporate fairness, providing a framework for assessing bias in blocking techniques. Through experimental analysis, we evaluate the effectiveness and fairness of various blocking methods, offering insights into their potential biases. Our findings highlight the importance of considering fairness in EM, particularly in the blocking phase, to ensure equitable outcomes in data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mhmoslemi2338/pre-EM-bias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis