Privacy-Preserving Record Linkage for Cardinality Counting
Nan Wu, Dinusha Vatsalan, Mohamed Ali Kaafar, Sanath Kumar Ramesh

TL;DR
This paper introduces a novel privacy-preserving record linkage algorithm using unsupervised clustering for accurate cardinality counting across multiple datasets, addressing privacy, data quality, and cluster optimization challenges.
Contribution
It presents a new privacy-preserving record linkage method with a novel approach to determine the optimal number of clusters, improving accuracy over existing fuzzy matching techniques.
Findings
Error rate less than 0.1 with privacy budget ε=1.0
Significantly better accuracy than state-of-the-art methods
Effective on both real and synthetic datasets
Abstract
Several applications require counting the number of distinct items in the data, which is known as the cardinality counting problem. Example applications include health applications such as rare disease patients counting for adequate awareness and funding, and counting the number of cases of a new disease for outbreak detection, marketing applications such as counting the visibility reached for a new product, and cybersecurity applications such as tracking the number of unique views of social media posts. The data needed for the counting is however often personal and sensitive, and need to be processed using privacy-preserving techniques. The quality of data in different databases, for example typos, errors and variations, poses additional challenges for accurate cardinality estimation. While privacy-preserving cardinality counting has gained much attention in the recent times and a few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data
