Monte Carlo Null Models for Genomic Data
Egil Ferkingstad, Lars Holden, Geir Kjetil Sandve

TL;DR
This paper discusses Monte Carlo null models for genomic data, emphasizing the importance of selecting appropriate null models based on data characteristics to ensure accurate p-value estimation.
Contribution
It introduces the null complexity principle, guiding the choice of null models by their data preservation level to improve hypothesis testing accuracy.
Findings
Null models ordered by data preservation tend to produce higher p-values.
The null complexity principle helps in selecting more appropriate null models.
Guidance for better null model choice in genomic data analysis.
Abstract
As increasingly complex hypothesis-testing scenarios are considered in many scientific fields, analytic derivation of null distributions is often out of reach. To the rescue comes Monte Carlo testing, which may appear deceptively simple: as long as you can sample test statistics under the null hypothesis, the -value is just the proportion of sampled test statistics that exceed the observed test statistic. Sampling test statistics is often simple once you have a Monte Carlo null model for your data, and defining some form of randomization procedure is also, in many cases, relatively straightforward. However, there may be several possible choices of a randomization null model for the data and no clear-cut criteria for choosing among them. Obviously, different null models may lead to very different -values, and a very low -value may thus occur due to the inadequacy of the chosen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
