Greedy Conjecture for the Shortest Common Superstring Problem and its Strengthenings
Maksim Nikolaev

TL;DR
This paper investigates the Greedy Conjecture for the Shortest Common Superstring problem, proposing strengthened greedy algorithms and analyzing their approximation bounds, revealing limitations and insights into the conjecture's validity.
Contribution
The paper introduces the Locally Greedy Algorithm and a symbol-occurrence metric, providing new approximation bounds and limitations for greedy heuristics in the superstring problem.
Findings
LGA is a uniform 4-approximation for the problem.
LGA is at least a 3-approximation, and the Greedy Algorithm is at least 2.5-approximate.
Strengthened greedy heuristics do not necessarily prove the Greedy Conjecture.
Abstract
In the Shortest Common Superstring problem, one needs to find the shortest superstring for a set of strings. This problem is APX-hard, and many approximation algorithms were proposed, with the current best approximation factor of 2.466. Whereas these algorithms are technically involved, for more than thirty years the Greedy Conjecture remains unsolved, that states that the Greedy Algorithm ``take two strings with the maximum overlap; merge them; repeat'' is a 2-approximation. This conjecture is still open, and one way to approach it is to consider its stronger version, which may make the proof easier due to the stronger premise or provide insights from its refutation. In this paper, we propose two directions to strengthen the conjecture. First, we introduce the Locally Greedy Algorithm (LGA), that selects a pair of strings not with the largest overlap but with the \emph{locally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Caching and Content Delivery · Optimization and Search Problems
