Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data
Rastislav Rabatin, Bro\v{n}a Brejov\'a, Tom\'a\v{s} Vina\v{r}

TL;DR
This paper introduces a novel ensemble-based seeding method for aligning high-error MinION sequencing reads, enhancing sensitivity and reducing false positives compared to traditional single-sequence approaches.
Contribution
It proposes representing MinION reads with sequence ensembles sampled from probabilistic models to improve alignment seeding accuracy.
Findings
Ensemble approach increases seeding sensitivity.
Reduces false positive rate in alignments.
Effective for high-error nanopore sequencing data.
Abstract
Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · RNA and protein synthesis mechanisms
