How to decide whether small samples comply with an equidistribution

Thorsten Poeschel; Jan A. Freund

arXiv:cond-mat/0205225·cond-mat.dis-nn·May 23, 2007

How to decide whether small samples comply with an equidistribution

Thorsten Poeschel, Jan A. Freund

PDF

Open Access

TL;DR

The paper introduces a new method for assessing whether small sample distributions are truly equidistributed, especially useful in biostatistics and computational biology for analyzing rare events.

Contribution

It proposes a simple, efficient criterion applicable to frequency ranked distributions that outperforms standard tests in small sample scenarios.

Findings

01

Effective differentiation between true equidistribution and triangular distribution.

02

Reliable assessment of rare events in small samples.

03

Outperforms chi-squared tests in small sample contexts.

Abstract

The decision whether a measured distribution complies with an equidistribution is a central element of many biostatistical methods. High throughput differential expression measurements, for instance, necessitate to judge possible over-representation of genes. The reliability of this judgement, however, is strongly affected when rarely expressed genes are pooled. We propose a method that can be applied to frequency ranked distributions and that yields a simple but efficient criterion to assess the hypothesis of equiprobable expression levels. By applying our technique to surrogate data we exemplify how the decision criterion can differentiate between a true equidistribution and a triangular distribution. The distinction succeeds even for small sample sizes where standard tests of significance (e.g. chi^2) fail. Our method will have a major impact on several problems of computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Fractal and DNA sequence analysis