Using Sequence Ensembles for Seeding Alignments of MinION Sequencing   Data

Rastislav Rabatin; Bro\v{n}a Brejov\'a; Tom\'a\v{s} Vina\v{r}

arXiv:1606.08719·cs.DS·June 29, 2016

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

Rastislav Rabatin, Bro\v{n}a Brejov\'a, Tom\'a\v{s} Vina\v{r}

PDF

Open Access

TL;DR

This paper introduces a novel ensemble-based seeding method for aligning high-error MinION sequencing reads, enhancing sensitivity and reducing false positives compared to traditional single-sequence approaches.

Contribution

It proposes representing MinION reads with sequence ensembles sampled from probabilistic models to improve alignment seeding accuracy.

Findings

01

Ensemble approach increases seeding sensitivity.

02

Reduces false positive rate in alignments.

03

Effective for high-error nanopore sequencing data.

Abstract

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · RNA and protein synthesis mechanisms