Pattern Masking for Dictionary Matching

Panagiotis Charalampopoulos; Huiping Chen; Peter Christen and; Grigorios Loukides; Nadia Pisanti; Solon P. Pissis; Jakub; Radoszewski

arXiv:2006.16137·cs.DS·March 11, 2024

Pattern Masking for Dictionary Matching

Panagiotis Charalampopoulos, Huiping Chen, Peter Christen and, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Jakub, Radoszewski

PDF

1 Repo

TL;DR

This paper introduces the Pattern Masking for Dictionary Matching (PMDM) problem, proves its NP-completeness, and offers both exact and approximate algorithms with theoretical guarantees for solving it in large-scale data systems.

Contribution

The paper formalizes PMDM, proves its NP-completeness, and provides novel data structures and algorithms for exact and approximate solutions, including practical approaches for small parameters.

Findings

01

NP-completeness of PMDM even over binary alphabet

02

Efficient data structure with query time $ ilde{O}(2^{rac{ ext{ell}}{2}})$

03

Polynomial-time approximation algorithm with ratio $O(d^{1/4+ ext{epsilon}})$

Abstract

In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $D$ of $d$ strings, each of length $ℓ$ , a query string $q$ of length $ℓ$ , and a positive integer $z$ , and we are asked to compute a smallest set $K \subseteq {1, \dots, ℓ}$ , so that if $q [i]$ , for all $i \in K$ , is replaced by a wildcard, then $q$ matches at least $z$ strings from $D$ . The PMDM problem lies at the heart of two important applications featured in large-scale real-world systems: record linkage of databases that contain sensitive information, and query term dropping. In both applications, solving PMDM allows for providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known $k$ -Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/pattern-masking/pmdm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.