All instantiations of the greedy algorithm for the shortest superstring problem are equivalent
Maksim Nikolaev

TL;DR
This paper proves that all variations of the greedy algorithm for the shortest superstring problem have the same approximation factor, regardless of tie-breaking rules, and explores their connection to a symbol occurrence minimization problem.
Contribution
It demonstrates that different greedy instantiations are equivalent in approximation quality and establishes a link between the SCS and a symbol occurrence minimization problem.
Findings
All greedy algorithm variations have identical approximation factors.
Transformations can make overlaps distinct without changing ratios.
The symbol occurrence minimization problem is equivalent to the original SCS.
Abstract
In the Shortest Common Superstring problem (SCS), one needs to find the shortest superstring for a set of strings. While SCS is NP-hard and MAX-SNP-hard, the Greedy Algorithm "choose two strings with the largest overlap; merge them; repeat" achieves a constant factor approximation that is known to be at most 3.5 and conjectured to be equal to 2. The Greedy Algorithm is not deterministic, so its instantiations with different tie-breaking rules may have different approximation factors. In this paper, we show that it is not the case: all factors are equal. To prove this, we show how to transform a set of strings so that all overlaps are different whereas their ratios stay roughly the same. We also reveal connections between the original version of SCS and the following one: find a~superstring minimizing the number of occurrences of a given symbol. It turns out that the latter problem is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Advanced Image and Video Retrieval Techniques
