String Indexing for Patterns with Wildcards
Philip Bille, Inge Li Goertz, Hjalte Wedel Vildh{\o}j, S{\o}ren Vind

TL;DR
This paper introduces new indexing methods for string pattern matching with wildcards, significantly improving query times and space efficiency over previous approaches, and extends these methods to handle variable length gaps.
Contribution
The paper presents novel indexing algorithms for wildcard pattern matching that achieve faster query times and better space bounds, including a linear space index and a time-space trade-off.
Findings
Linear space index with $O(m+\sigma^j \log \log n + occ)$ query time
Index with $O(m+j+occ)$ query time using $O(\sigma^{k^2} n \log^k \log n)$ space
Generalization to patterns with variable length gaps
Abstract
We consider the problem of indexing a string of length to report the occurrences of a query pattern containing characters and wildcards. Let be the number of occurrences of in , and the size of the alphabet. We obtain the following results. - A linear space index with query time . This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time in the worst case. - An index with query time using space , where is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization
