Separating Sets of Strings by Finding Matching Patterns is Almost Always Hard
Giuseppe Lancia, Luke Mathieson, Pablo Moscato

TL;DR
This paper investigates the computational complexity of finding pattern sets that distinguish two string sets, revealing NP-completeness and providing a detailed parameterized complexity analysis with some tractable cases.
Contribution
It establishes the NP-completeness of the pattern separation problem and offers a comprehensive parameterized complexity analysis identifying tractable variants.
Findings
The problem is NP-complete and W[2]-hard when parameterized by pattern set size.
Certain parameterizations, such as pattern set size and number of strings, lead to fixed-parameter tractability.
The problem is APX-hard, indicating difficulty in approximation.
Abstract
We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
