Accurate and Efficient Suffix Tree Based Privacy-Preserving String Matching
Sirintra Vaiwsri, Thilina Ranbaduge, Peter Christen, and Kee Siong Ng

TL;DR
This paper introduces a novel privacy-preserving string matching method using encoded suffix trees with chained hashing, enabling accurate similarity measures without revealing the strings, applicable to sensitive data like bank numbers or phone numbers.
Contribution
The paper presents a new suffix tree-based privacy-preserving string matching technique that improves accuracy and privacy, addressing limitations of existing set-based methods.
Findings
Effective in preserving privacy against frequency attacks
Accurately identifies longest common substrings
Suitable for sensitive data like bank and phone numbers
Abstract
The task of calculating similarities between strings held by different organizations without revealing these strings is an increasingly important problem in areas such as health informatics, national censuses, genomics, and fraud detection. Most existing privacy-preserving string comparison functions are either based on comparing sets of encoded character q-grams, allow only exact matching of encrypted strings, or they are aimed at long genomic sequences that have a small alphabet. The set-based privacy-preserving similarity functions commonly used to compare name and address strings in the context of privacy-preserving record linkage do not take the positions of sub-strings into account. As a result, two very different strings can potentially be considered as an exact match leading to wrongly linked records. Existing set-based techniques also cannot identify the length of the longest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cryptography and Data Security · Data Quality and Management
