Sparse-limit approximation for t-statistics

Micol Tresoldi; Daniel Xiang; Peter McCullagh

arXiv:2307.01395·math.ST·December 20, 2023

Sparse-limit approximation for t-statistics

Micol Tresoldi, Daniel Xiang, Peter McCullagh

PDF

Open Access

TL;DR

This paper develops a sparse-mixture approximation for the non-null density of t-statistics in genomic data, accounting for unknown variances and low degrees of freedom, enhancing evidence quantification in sparse signals.

Contribution

It introduces a novel sparse-mixture approximation for the non-null t-statistic density, addressing unknown variances and low degrees of freedom effects.

Findings

01

Demonstrates the impact of low degrees of freedom on Bayes factors.

02

Provides a new approximation formula for non-null t-statistics.

03

Illustrates differences using HIV gene-expression data.

Abstract

In a range of genomic applications, it is of interest to quantify the evidence that the signal at site~ $i$ is active given conditionally independent replicate observations summarized by the sample mean and variance $(\overset{ˉ}{Y}, s^{2})$ at each site. We study the version of the problem in which the signal distribution is sparse, and the error distribution has an unknown site-specific variance so that the null distribution of the standardized statistic is Student- $t$ rather than Gaussian. The main contribution of this paper is a sparse-mixture approximation to the non-null density of the $t$ -ratio. This formula demonstrates the effect of low degrees of freedom on the Bayes factor, or the conditional probability that the site is active. We illustrate some differences on a HIV dataset for gene-expression data previously analyzed by Efron (2012).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bayesian Methods and Mixture Models · Bioinformatics and Genomic Networks