Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach
Marc Coram, Hua Tang

TL;DR
This paper introduces an empirical Bayes method for estimating allele frequencies that adaptively combines data from related samples, reducing estimation error while minimizing bias, especially useful in genome-wide association studies.
Contribution
The paper presents a novel empirical Bayes approach that adaptively pools data from related populations to improve allele frequency estimates with reduced bias and variance.
Findings
Estimator achieves lower mean squared error than pooling or no pooling.
Method reduces bias while maintaining low variance in estimates.
Effective in small sample groups with many genetic markers.
Abstract
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
