Threshold-Independent Fair Matching through Score Calibration

Mohammad Hossein Moslemi; Mostafa Milani

arXiv:2405.20051·cs.LG·May 31, 2024

Threshold-Independent Fair Matching through Score Calibration

Mohammad Hossein Moslemi, Mostafa Milani

PDF

1 Repo

TL;DR

This paper proposes a threshold-independent score calibration method for entity matching that reduces bias and maintains accuracy, addressing fairness issues overlooked by traditional threshold-based approaches.

Contribution

It introduces a novel calibration technique using Wasserstein barycenters to mitigate bias in entity matching scores without relying on thresholds.

Findings

01

Biases in matching scores can be effectively reduced.

02

Calibration preserves accuracy across datasets.

03

Threshold-independent fairness improves data cleaning processes.

Abstract

Entity Matching (EM) is a critical task in numerous fields, such as healthcare, finance, and public administration, as it identifies records that refer to the same entity within or across different databases. EM faces considerable challenges, particularly with false positives and negatives. These are typically addressed by generating matching scores and apply thresholds to balance false positives and negatives in various contexts. However, adjusting these thresholds can affect the fairness of the outcomes, a critical factor that remains largely overlooked in current fair EM research. The existing body of research on fair EM tends to concentrate on static thresholds, neglecting their critical impact on fairness. To address this, we introduce a new approach in EM using recent metrics for evaluating biases in score based binary classification, particularly through the lens of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mhmoslemi2338/CaliFair-EM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.