Detecting simultaneous variant intervals in aligned sequences
David Siegmund, Benjamin Yakir, Nancy R. Zhang

TL;DR
This paper introduces a new statistical method for detecting short intervals of simultaneous mean changes across multiple aligned sequences, with applications in DNA copy number variation detection, improving upon existing single-sample and multi-sample methods.
Contribution
The paper proposes a novel scan statistic for detecting simultaneous change intervals in multiple sequences, with analytic false positive approximation and demonstrated robustness.
Findings
Method generally outperforms single-sample analysis.
Improves upon previous multi-sample methods for small carrier fractions.
Analytic false positive approximation is accurate according to simulations.
Abstract
Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
