New tight approximations for Fisher's exact test

Wilhelmiina H\"am\"al\"ainen

arXiv:1405.1250·stat.CO·May 7, 2014

New tight approximations for Fisher's exact test

Wilhelmiina H\"am\"al\"ainen

PDF

Open Access

TL;DR

This paper introduces new fast upper bound approximations for Fisher's exact test that are accurate across various data sizes and distributions, improving over traditional chi-squared approximations especially for strong dependencies.

Contribution

The authors develop a family of upper bounds that provide accurate, fast approximations to Fisher's exact test, overcoming limitations of chi-squared methods in large datasets.

Findings

01

New upper bounds are computationally efficient.

02

Approximations are accurate for strong dependencies.

03

Method is robust to data size and distribution.

Abstract

Fisher's exact test is often a preferred method to estimate the significance of statistical dependence. However, in large data sets the test is usually too worksome to be applied, especially in an exhaustive search (data mining). The traditional solution is to approximate the significance with the $χ^{2}$ -measure, but the accuracy is often unacceptable. As a solution, we introduce a family of upper bounds, which are fast to calculate and approximate Fisher's $p$ -value accurately. In addition, the new approximations are not sensitive to the data size, distribution, or smallest expected counts like the $χ^{2}$ -based approximation. According to both theoretical and experimental analysis, the new approximations produce accurate results for all sufficiently strong dependencies. The basic form of the approximation can fail with weak dependencies, but the general form of the upper bounds can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Gaussian Processes and Bayesian Inference