Reducing Biases in Record Matching Through Scores Calibration
Mohammad Hossein Moslemi, Mostafa Milani

TL;DR
This paper introduces a threshold-independent measure of score bias in record matching, revealing biases in state-of-the-art models and proposing post-processing calibration methods to reduce disparities without retraining.
Contribution
It extends fairness criteria to score functions, proposes model-agnostic calibration methods using optimal transport, and demonstrates their effectiveness in reducing bias in record matching.
Findings
State-of-the-art matchers exhibit significant score bias.
Calibration methods effectively reduce bias with minimal accuracy loss.
Proposed methods have theoretical guarantees and work on standard benchmarks.
Abstract
Record matching models typically output a real-valued matching score that is later consumed through thresholding, ranking, or human review. While fairness in record matching has mostly been assessed using binary decisions at a fixed threshold, such evaluations can miss systematic disparities in the entire score distribution and can yield conclusions that change with the chosen threshold. We introduce a threshold-independent notion of score bias that extends standard group-fairness criteria-demographic parity (DP), equal opportunity (EO), and equalized odds (EOD)-from binary outputs to score functions by integrating group-wise metric gaps over all thresholds. Using this metric, we empirically show that several state-of-the-art deep matchers can exhibit substantial score bias even when appearing fair at commonly used thresholds. To mitigate these disparities without retraining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Sports Analytics and Performance · Advanced Statistical Methods and Models
MethodsFocus
