Identifying Graphical Models
Maya Shevlyakova, Stephan Morgenthaler

TL;DR
This paper examines the challenges of reliably identifying gene interactions in high-dimensional data, highlighting limitations of classical methods and proposing an information-theoretic perspective.
Contribution
It introduces an analysis based on Kullback-Leibler divergence to assess the detectability of gene interactions, revealing limitations in typical study sizes.
Findings
Commonly sized studies cannot reliably detect moderately strong links.
Classical statistical approaches may be insufficient for high-dimensional gene expression data.
Information-theoretic analysis provides new insights into the detectability of effects.
Abstract
The ability to identify reliably a positive or negative partial correlation between the expression levels of two genes is influenced by the number of genes, the number of analyzed samples, and the statistical properties of the measurements. Classical statistical theory teaches that the product of the root sample size multiplied by the size of the partial correlation is the crucial quantity. But this has to be combined with some adjustment for multiplicity depending on , which makes the classical analysis somewhat arbitrary. We investigate this problem through the lens of the Kullback-Leibler divergence, which is a measure of the average information for detecting an effect. We conclude that commonly sized studies in genetical epidemiology are not able to reliably detect moderately strong links.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models · Bayesian Methods and Mixture Models · Stochastic processes and statistical mechanics
