A Simple Data-Adaptive Probabilistic Variant Calling Model
Steve Hoffmann, Peter F. Stadler, Korbinian Strimmer

TL;DR
This paper presents a simple, data-adaptive probabilistic model for variant calling in sequencing data that adjusts to experiment-specific noise factors, achieving competitive sensitivity and specificity, especially at low allele frequencies.
Contribution
The paper introduces a novel, straightforward probabilistic model that automatically adapts to sequencing noise factors, improving variant calling accuracy over complex existing methods.
Findings
Model is competitive with complex algorithms in sensitivity and specificity.
Performs well with low allele frequencies.
Effectively captures data-specific noise influences.
Abstract
Background: Several sources of noise obfuscate the identification of single nucleotide variation (SNV) in next generation sequencing data. For instance, errors may be introduced during library construction and sequencing steps. In addition, the reference genome and the algorithms used for the alignment of the reads are further critical factors determining the efficacy of variant calling methods. It is crucial to account for these factors in individual sequencing experiments. Results: We introduce a simple data-adaptive model for variant calling. This model automatically adjusts to specific factors such as alignment errors. To achieve this, several characteristics are sampled from sites with low mismatch rates, and these are used to estimate empirical log-likelihoods. These likelihoods are then combined to a score that typically gives rise to a mixture distribution. From these we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genomics and Rare Diseases · Molecular Biology Techniques and Applications
