KMC 3: counting and manipulating k-mer statistics

Marek Kokot; Maciej D{\l}ugosz; Sebastian Deorowicz

arXiv:1701.08022·q-bio.GN·January 30, 2017·5 cites

KMC 3: counting and manipulating k-mer statistics

Marek Kokot, Maciej D{\l}ugosz, Sebastian Deorowicz

PDF

Open Access

TL;DR

KMC 3 is an improved algorithm and toolset for efficient counting and manipulation of k-mer statistics in bioinformatics, enabling faster processing of large datasets.

Contribution

The paper introduces KMC 3, a significantly enhanced version of KMC 2, with new tools for handling k-mer databases in bioinformatics applications.

Findings

01

Demonstrates usefulness on real bioinformatics problems

02

Provides faster k-mer counting performance

03

Offers freely available tools for the community

Abstract

Summary: Counting all k-mers in a given dataset is a standard procedure in many bioinformatics applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm together with KMC tools for manipulating k-mer databases. Usefulness of the tools is shown on a few real problems. Availability: Program is freely available at http://sun.aei.polsl.pl/REFRESH/kmc. Contact: [email protected]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Algorithms and Data Compression · Genomics and Phylogenetic Studies