A note on the shortest common superstring of NGS reads

Tristan Braquelaire; Marie Gasparoux; Mathieu Raffinot and; Raluca Uricaru

arXiv:1605.05542·cs.DM·May 19, 2016

A note on the shortest common superstring of NGS reads

Tristan Braquelaire, Marie Gasparoux, Mathieu Raffinot and, Raluca Uricaru

PDF

Open Access

TL;DR

This paper demonstrates that the approximation ratio for the Shortest Superstring Problem can be improved specifically for NGS reads by leveraging their unique characteristics, which are verified on large datasets.

Contribution

The paper introduces an improved approximation approach for SSP tailored to NGS reads, exploiting their specific properties verified experimentally.

Findings

01

Improved approximation ratio for SSP on NGS data.

02

Experimental verification of NGS-specific characteristics.

03

Enhanced practical applicability of SSP algorithms.

Abstract

The Shortest Superstring Problem (SSP) consists, for a set of strings S = {s_1,...,s_n}, to find a minimum length string that contains all s_i, 1 <= i <= k, as substrings. This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being 2+11/23, which has been achieved following a long and difficult quest. However, SSP is highly used in practice on next generation sequencing (NGS) data, which plays an increasingly important role in sequencing. In this note, we show that the SSP approximation ratio can be improved on NGS reads by assuming specific characteristics of NGS data that are experimentally verified on a very large sampling set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Oral and gingival health research · Genome Rearrangement Algorithms