Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform
Jacqueline W. Daykin, Richard Groult, Yannick Guesnet and, Thierry Lecroq, Arnaud Lefebvre, Martine L\'eonard, Laurent Mouchard, and \'Elise Prieur-Gaston, Bruce Watson

TL;DR
This paper introduces a new hybrid pattern-matching method based on the Burrows-Wheeler transform for efficiently searching degenerate strings, applicable to both regular and indeterminate sequences, with practical performance benefits.
Contribution
It presents a novel BWT-based approach for degenerate string pattern matching, including a specialized method for conservative strings with bounded non-solid letters.
Findings
Method runs in O(mn) time for general degenerate strings.
For conservative strings, search complexity reduces to O(qm^2).
Experimental results demonstrate practical efficiency.
Abstract
A degenerate or indeterminate string on an alphabet is a sequence of non-empty subsets of . Given a degenerate string of length , we present a new method based on the Burrows--Wheeler transform for searching for a degenerate pattern of length in running in time on a constant size alphabet . Furthermore, it is a hybrid pattern-matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant ; in this case we show that the search complexity time is . Experimental results show that our method performs well in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Network Packet Processing and Optimization
