Sparse-limit approximation for t-statistics
Micol Tresoldi, Daniel Xiang, Peter McCullagh

TL;DR
This paper develops a sparse-mixture approximation for the non-null density of t-statistics in genomic data, accounting for unknown variances and low degrees of freedom, enhancing evidence quantification in sparse signals.
Contribution
It introduces a novel sparse-mixture approximation for the non-null t-statistic density, addressing unknown variances and low degrees of freedom effects.
Findings
Demonstrates the impact of low degrees of freedom on Bayes factors.
Provides a new approximation formula for non-null t-statistics.
Illustrates differences using HIV gene-expression data.
Abstract
In a range of genomic applications, it is of interest to quantify the evidence that the signal at site~ is active given conditionally independent replicate observations summarized by the sample mean and variance at each site. We study the version of the problem in which the signal distribution is sparse, and the error distribution has an unknown site-specific variance so that the null distribution of the standardized statistic is Student- rather than Gaussian. The main contribution of this paper is a sparse-mixture approximation to the non-null density of the -ratio. This formula demonstrates the effect of low degrees of freedom on the Bayes factor, or the conditional probability that the site is active. We illustrate some differences on a HIV dataset for gene-expression data previously analyzed by Efron (2012).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bayesian Methods and Mixture Models · Bioinformatics and Genomic Networks
