Evaluating Blocking Biases in Entity Matching
Mohammad Hossein Moslemi, Harini Balamurugan, Mostafa Milani

TL;DR
This paper examines the fairness of blocking techniques in Entity Matching, extending metrics to assess bias, and evaluates various methods to promote equitable data integration outcomes.
Contribution
It introduces a fairness-aware framework for evaluating blocking methods in Entity Matching, addressing an overlooked bias issue in data integration.
Findings
Certain blocking methods exhibit demographic biases
Fairness-aware blocking improves equitable data matching
Traditional metrics overlook bias in blocking techniques
Abstract
Entity Matching (EM) is crucial for identifying equivalent data entities across different sources, a task that becomes increasingly challenging with the growth and heterogeneity of data. Blocking techniques, which reduce the computational complexity of EM, play a vital role in making this process scalable. Despite advancements in blocking methods, the issue of fairness; where blocking may inadvertently favor certain demographic groups; has been largely overlooked. This study extends traditional blocking metrics to incorporate fairness, providing a framework for assessing bias in blocking techniques. Through experimental analysis, we evaluate the effectiveness and fairness of various blocking methods, offering insights into their potential biases. Our findings highlight the importance of considering fairness in EM, particularly in the blocking phase, to ensure equitable outcomes in data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis
