Privacy-Preserving Record Linkage for Cardinality Counting

Nan Wu; Dinusha Vatsalan; Mohamed Ali Kaafar; Sanath Kumar Ramesh

arXiv:2301.04000·cs.CR·January 11, 2023

Privacy-Preserving Record Linkage for Cardinality Counting

Nan Wu, Dinusha Vatsalan, Mohamed Ali Kaafar, Sanath Kumar Ramesh

PDF

Open Access

TL;DR

This paper introduces a novel privacy-preserving record linkage algorithm using unsupervised clustering for accurate cardinality counting across multiple datasets, addressing privacy, data quality, and cluster optimization challenges.

Contribution

It presents a new privacy-preserving record linkage method with a novel approach to determine the optimal number of clusters, improving accuracy over existing fuzzy matching techniques.

Findings

01

Error rate less than 0.1 with privacy budget ε=1.0

02

Significantly better accuracy than state-of-the-art methods

03

Effective on both real and synthetic datasets

Abstract

Several applications require counting the number of distinct items in the data, which is known as the cardinality counting problem. Example applications include health applications such as rare disease patients counting for adequate awareness and funding, and counting the number of cases of a new disease for outbreak detection, marketing applications such as counting the visibility reached for a new product, and cybersecurity applications such as tracking the number of unique views of social media posts. The data needed for the counting is however often personal and sensitive, and need to be processed using privacy-preserving techniques. The quality of data in different databases, for example typos, errors and variations, poses additional challenges for accurate cardinality estimation. While privacy-preserving cardinality counting has gained much attention in the recent times and a few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Privacy-Preserving Technologies in Data