Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage
Dinusha Vatsalan, Peter Christen, and Erhard Rahm

TL;DR
This paper introduces incremental clustering techniques for multi-party privacy-preserving record linkage, enabling efficient and scalable matching across multiple datasets with improved accuracy over existing methods.
Contribution
It proposes novel MP-PPRL approaches that identify matches in any subset of parties and scale to many sources, addressing limitations of previous methods.
Findings
Outperforms existing MP-PPRL in linkage quality
Demonstrates scalability with datasets up to 26 parties and 5 million records
Efficiently maintains and refines clusters incrementally
Abstract
Privacy-Preserving Record Linkage (PPRL) supports the integration of sensitive information from multiple datasets, in particular the privacy-preserving matching of records referring to the same entity. PPRL has gained much attention in many application areas, with the most prominent ones in the healthcare domain. PPRL techniques tackle this problem by conducting linkage on masked (encoded) values. Employing PPRL on records from multiple (more than two) parties/sources (multi-party PPRL, MP-PPRL) is an increasingly important but challenging problem that so far has not been sufficiently solved. Existing MP-PPRL approaches are limited to finding only those entities that are present in all parties thereby missing entities that match only in a subset of parties. Furthermore, previous MP-PPRL approaches face substantial scalability limitations due to the need of a large number of comparisons…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
