A frame-based representation of genomic sequences for removing errors and rare variant detection in NGS data
Raunaq Malhotra, Manjari Mukhopadhyay, Mary Poss, Raj Acharya

TL;DR
This paper introduces MultiRes, a novel frame-based method for error correction and rare variant detection in NGS data, significantly reducing false positives and improving SNP and variant detection accuracy in viral populations.
Contribution
The paper presents a new frame-based genome representation and a classifier that outperforms existing methods in error correction and rare variant detection.
Findings
MultiRes reduces false positive k-mer predictions by 4 to 500 times.
It achieves over 95% recall for SNP detection.
MultiRes detects more rare variants than existing methods.
Abstract
We propose a frame-based representation of k-mers for detecting sequencing errors and rare variants in next generation sequencing data obtained from populations of closely related genomes. Frames are sets of non-orthogonal basis functions, traditionally used in signal processing for noise removal. We define a frame for genomes and sequenced reads to consist of discrete spatial signals of every k-mer of a given size. We show that each k-mer in the sequenced data can be projected onto multiple frames and these projections are maximized for spatial signals corresponding to the k-mer's substrings. Our proposed classifier, MultiRes, is trained on the projections of k-mers as features used for marking k-mers as erroneous or true variations in the genome. We evaluate MultiRes on simulated and real viral population datasets and compare it to other error correction methods known in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Chromosomal and Genetic Variations · RNA and protein synthesis mechanisms
