Entity Resolution and Federated Learning get a Federated Resolution
Richard Nock, Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, and Giorgio Patrini, Guillaume Smith, Brian Thorne

TL;DR
This paper investigates how errors in entity resolution affect federated learning over vertically partitioned data, providing theoretical insights and practical methods to improve learning outcomes by controlling matching errors.
Contribution
It offers a formal analysis of the impact of entity resolution errors on learning performance and proposes practical strategies to enhance resolution by minimizing cross-class mismatches.
Findings
Entity resolution errors significantly affect classifier performance.
Using class information during resolution improves learning outcomes.
Controlling cross-class matching errors enhances federated learning accuracy.
Abstract
Consider two data providers, each maintaining records of different feature sets about common entities. They aim to learn a linear model over the whole set of features. This problem of federated learning over vertically partitioned data includes a crucial upstream issue: entity resolution, i.e. finding the correspondence between the rows of the datasets. It is well known that entity resolution, just like learning, is mistake-prone in the real world. Despite the importance of the problem, there has been no formal assessment of how errors in entity resolution impact learning. In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. While our answer spans a wide set of losses --- going beyond proper, convex, or classification calibrated ---, it brings simple practical arguments to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Cryptography and Data Security
