A note on the shortest common superstring of NGS reads
Tristan Braquelaire, Marie Gasparoux, Mathieu Raffinot and, Raluca Uricaru

TL;DR
This paper demonstrates that the approximation ratio for the Shortest Superstring Problem can be improved specifically for NGS reads by leveraging their unique characteristics, which are verified on large datasets.
Contribution
The paper introduces an improved approximation approach for SSP tailored to NGS reads, exploiting their specific properties verified experimentally.
Findings
Improved approximation ratio for SSP on NGS data.
Experimental verification of NGS-specific characteristics.
Enhanced practical applicability of SSP algorithms.
Abstract
The Shortest Superstring Problem (SSP) consists, for a set of strings S = {s_1,...,s_n}, to find a minimum length string that contains all s_i, 1 <= i <= k, as substrings. This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being 2+11/23, which has been achieved following a long and difficult quest. However, SSP is highly used in practice on next generation sequencing (NGS) data, which plays an increasingly important role in sequencing. In this note, we show that the SSP approximation ratio can be improved on NGS reads by assuming specific characteristics of NGS data that are experimentally verified on a very large sampling set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Oral and gingival health research · Genome Rearrangement Algorithms
