# Modifying the Chi-square and the CMH test for population genetic   inference: adapting to over-dispersion

**Authors:** Kerstin Spitzer, Marta Pelizzola, Andreas Futschik

arXiv: 1902.08127 · 2019-02-22

## TL;DR

This paper introduces modified chi-square and CMH tests that account for over-dispersion in population genetic data, reducing false positives in genome-wide studies involving allele frequency changes.

## Contribution

The authors develop simple adjusted test statistics that incorporate over-dispersion, improving inference accuracy in evolving populations with noise from genetic drift and sequencing.

## Key findings

- Adjusted tests reduce false positive rates.
- Application to Drosophila data demonstrates effectiveness.
- Formulas applicable to other over-dispersed data scenarios.

## Abstract

Evolve and resequence studies provide a popular approach to simulate evolution in the lab and explore its genetic basis. In this context, the chi-square test, Fishers exact test, as well as the Cochran-Mantel-Haenszel test are commonly used to infer genomic positions affected by selection from temporal changes in allele frequency. However, the null model associated with these tests does not match the null hypothesis of actual interest. Indeed due to genetic drift and possibly other additional noise components such as pool sequencing, the null variance in the data can be substantially larger than accounted forby these common test statistics. This leads to p-values that are systematically too small and therefore a huge number of false positive results. Even, if the ranking rather than the actual p-values is of interest, a naive application of the mentioned tests will give misleading results, as the amount of over-dispersion varies from locus to locus. We therefore propose adjusted statistics that take the over-dispersion into account while keeping the formulas simple. This is particularly useful in genome-wide applications, where millions of SNPs can be handled with little computational effort. We then apply the adapted test statistics to real data fromDrosophila, and investigate how in-formation from intermediate generations can be included when avail-able. The obtained formulas may also be useful in other situations, provided that the null variance either is known or can be estimated.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08127/full.md

## Figures

53 figures with captions in the complete paper: https://tomesphere.com/paper/1902.08127/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1902.08127/full.md

---
Source: https://tomesphere.com/paper/1902.08127