Privacy-preserving record linkage using local sensitive hash and private set intersection
Allon Adir, Ehud Aharoni, Nir Drucker, Eyal Kushnir, Ramy Masalha,, Michael Mirkin, Omri Soceanu

TL;DR
This paper introduces a new privacy-preserving record linkage protocol combining private set intersection and local sensitive hashing, enabling efficient and practical linking of large datasets while maintaining privacy.
Contribution
The paper proposes a novel PPRL protocol that integrates PSI and LSH, achieving linear runtime and practical performance for large datasets.
Findings
Runs in linear time for large datasets
Links datasets with up to 2^20 records in under an hour
Provides formal privacy guarantees
Abstract
The amount of data stored in data repositories increases every year. This makes it challenging to link records between different datasets across companies and even internally, while adhering to privacy regulations. Address or name changes, and even different spelling used for entity data, can prevent companies from using private deduplication or record-linking solutions such as private set intersection (PSI). To this end, we propose a new and efficient privacy-preserving record linkage (PPRL) protocol that combines PSI and local sensitive hash (LSH) functions, and runs in linear time. We explain the privacy guarantees that our protocol provides and demonstrate its practicality by executing the protocol over two datasets with records each, in minutes, depending on network settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Cloud Data Security Solutions · Privacy-Preserving Technologies in Data
