A resource-frugal probabilistic dictionary and applications in   (meta)genomics

Camille Marchet; Antoine Limasset; Lucie Bittner; Pierre; Peterlongo

arXiv:1605.08319·cs.DS·May 27, 2016·5 cites

A resource-frugal probabilistic dictionary and applications in (meta)genomics

Camille Marchet, Antoine Limasset, Lucie Bittner, Pierre, Peterlongo

PDF

Open Access

TL;DR

This paper introduces a scalable, resource-efficient probabilistic dictionary for indexing billions of genomic sequences, enabling new applications in genomics and metagenomics that outperform existing solutions.

Contribution

The paper presents a novel, scalable probabilistic indexing structure that handles billions of elements, with two applications demonstrating its effectiveness in genomics and metagenomics.

Findings

01

Successfully indexes billions of genomic sequences

02

Enables new scalable applications in genomics and metagenomics

03

Outperforms existing indexing solutions in scalability

Abstract

Genomic and metagenomic fields, generating huge sets of short genomic sequences, brought their own share of high performance problems. To extract relevant pieces of information from the huge data sets generated by current sequencing techniques, one must rely on extremely scalable methods and solutions. Indexing billions of objects is a task considered too expensive while being a fundamental need in this field. In this paper we propose a straightforward indexing structure that scales to billions of element and we propose two direct applications in genomics and metagenomics. We show that our proposal solves problem instances for which no other known solution scales-up. We believe that many tools and applications could benefit from either the fundamental data structure we provide or from the applications developed from this structure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Evolutionary Algorithms and Applications · Metaheuristic Optimization Algorithms Research