Subsampling Methods for genomic inference
Peter J. Bickel, Nathan Boley, James B. Brown, Haiyan Huang, Nancy R., Zhang

TL;DR
This paper introduces a subsampling method based on a piecewise stationary model to accurately assess statistical significance in large-scale genomic data, addressing limitations of current ad hoc simulation techniques.
Contribution
It proposes a unified subsampling approach for genomic inference that accounts for complex dependencies, improving the reliability of significance testing.
Findings
The subsampling method provides correct significance estimates under the piecewise stationary model.
Simulation studies demonstrate the method's accuracy and robustness.
Application to real data examples shows practical utility in genomic analysis.
Abstract
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of -values and confidence bounds for statistics defined on the genome. Currently such computation is commonly achieved through ad hoc simulation measures. The method of randomization, which is at the heart of these simulation procedures, can significantly affect the resulting statistical conclusions. Most simulation schemes introduce a variety of hidden assumptions regarding the nature of the randomness in the data, resulting in a failure to capture biologically meaningful relationships. To address the need for a method of assessing the significance of observations within large scale genomic studies, where there often exists a complex dependency structure between observations, we propose a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
