The Fine-Grained Complexity of Episode Matching
Philip Bille, Inge Li G{\o}rtz, Shay Mozes, Teresa Anna Steiner, Oren, Weimann

TL;DR
This paper establishes the computational hardness of the Episode Matching problem, proves tight bounds for data structures solving it, and introduces faster algorithms for specific cases, advancing understanding of its complexity.
Contribution
The paper proves SETH-based lower bounds for Episode Matching, provides near-optimal data structures for indexing, and offers a faster solution for the case when pattern length is two.
Findings
No sub-quadratic algorithm exists under SETH.
A space-efficient data structure answers queries in near-logarithmic time.
Faster algorithms are developed for patterns of length two.
Abstract
Given two strings and , the Episode Matching problem is to find the shortest substring of that contains as a subsequence. The best known upper bound for this problem is by Das et al. (1997) , where are the lengths of and , respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that no algorithm (even for binary strings) exists, unless the Strong Exponential Time Hypothesis (SETH) is false. We then consider the indexing version of the problem, where is preprocessed into a data structure for answering episode matching queries . We show that for any , there is a data structure using space that answers episode matching queries for any of length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Mining Algorithms and Applications · Machine Learning and Algorithms
