Cnidaria: fast, reference-free clustering of raw and assembled genome   and transcriptome NGS data

Saulo Alves Aflitos; Edouard Severing; Gabino Sanchez-Perez; Sander; Peters; Hans de Jong; Dick de Ridder

arXiv:1511.05530·q-bio.GN·November 18, 2015·BMC Bioinform.

Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

Saulo Alves Aflitos, Edouard Severing, Gabino Sanchez-Perez, Sander, Peters, Hans de Jong, Dick de Ridder

PDF

TL;DR

Cnidaria is a scalable, reference-free clustering tool for genomic and transcriptomic data that accurately identifies specimens across large genomes and phylogenetic distances, facilitating diverse biological analyses.

Contribution

It introduces Cnidaria, a novel method capable of clustering large-scale genomic and transcriptomic datasets without prior references or size limitations.

Findings

01

Achieved 100% accuracy at supra-species level

02

Achieved 78% accuracy at species level

03

Successfully clustered 169 datasets from 4 kingdoms

Abstract

Background: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% identification accuracy at supra-species level and 78% accuracy for species level. Discussion: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.