Detecting mutations in mixed sample sequencing data using empirical Bayes
Omkar Muralidharan, Georges Natsoulis, John Bell, Hanlee Ji, Nancy R., Zhang

TL;DR
This paper introduces an empirical Bayes method combined with a randomization technique to improve detection of low-prevalence DNA mutations in sequencing data, effectively addressing challenges posed by discrete data and variable error rates.
Contribution
It presents a novel empirical Bayes approach with a randomization technique to accurately detect mutations in discrete sequencing data, outperforming existing methods.
Findings
Outperforms existing mutation detection methods on example datasets.
Effectively estimates false discovery rates for discrete test statistics.
Addresses low-prevalence mutation detection with variable error rates.
Abstract
We develop statistically based methods to detect single nucleotide DNA mutations in next generation sequencing data. Sequencing generates counts of the number of times each base was observed at hundreds of thousands to billions of genome positions in each sample. Using these counts to detect mutations is challenging because mutations may have very low prevalence and sequencing error rates vary dramatically by genome position. The discreteness of sequencing data also creates a difficult multiple testing problem: current false discovery rate methods are designed for continuous data, and work poorly, if at all, on discrete data. We show that a simple randomization technique lets us use continuous false discovery rate methods on discrete data. Our approach is a useful way to estimate false discovery rates for any collection of discrete test statistics, and is hence not limited to sequencing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
