TL;DR
This paper introduces the Pattern Masking for Dictionary Matching (PMDM) problem, proves its NP-completeness, and offers both exact and approximate algorithms with theoretical guarantees for solving it in large-scale data systems.
Contribution
The paper formalizes PMDM, proves its NP-completeness, and provides novel data structures and algorithms for exact and approximate solutions, including practical approaches for small parameters.
Findings
NP-completeness of PMDM even over binary alphabet
Efficient data structure with query time $ ilde{O}(2^{rac{ ext{ell}}{2}})$
Polynomial-time approximation algorithm with ratio $O(d^{1/4+ ext{epsilon}})$
Abstract
In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary of strings, each of length , a query string of length , and a positive integer , and we are asked to compute a smallest set , so that if , for all , is replaced by a wildcard, then matches at least strings from . The PMDM problem lies at the heart of two important applications featured in large-scale real-world systems: record linkage of databases that contain sensitive information, and query term dropping. In both applications, solving PMDM allows for providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known -Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
