Highly Scalable Algorithms for Robust String Barcoding

Bhaskar DasGupta; Kishori M. Konwar; Ion I. Mandoiu; Alex A.; Shvartsman

arXiv:cs/0502065·cs.DS·August 31, 2016

Highly Scalable Algorithms for Robust String Barcoding

Bhaskar DasGupta, Kishori M. Konwar, Ion I. Mandoiu, Alex A., Shvartsman

PDF

Open Access

TL;DR

This paper presents highly scalable algorithms for robust string barcoding, enabling efficient microorganism identification using whole-genome sequences on standard workstations, with near-optimal distinguishers.

Contribution

It introduces scalable, parallelizable algorithms for string barcoding that handle large genomic datasets efficiently and achieve near-theoretical limits.

Findings

01

Algorithms work on hundreds of genomes on a standard workstation.

02

Whole-genome selection yields near-optimal distinguishers.

03

Methods are easily parallelizable for larger datasets.

Abstract

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Gene expression and cancer classification