On Approximating String Selection Problems with Outliers
Christina Boucher, Gad M. Landau, Avivit Levy, David Pritchard and, Oren Weimann

TL;DR
This paper investigates the computational complexity of string selection problems with outliers, proving that several related problems are hard to approximate within any polynomial time scheme, thus highlighting their intrinsic difficulty.
Contribution
It corrects a previous misconception by proving no PTAS exists for these problems unless unlikely complexity class collapses, and analyzes the hardness of related bioinformatics string problems.
Findings
Many string selection problems with outliers are NP-hard to approximate.
No PTAS exists for the problems unless ZPP=NP.
Closest to k Strings has no EPTAS unless W[1]=FPT.
Abstract
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Software Testing and Debugging Techniques
