Using linear predictors to impute allele frequencies from summary or pooled genotype data
Xiaoquan Wen, Matthew Stephens

TL;DR
This paper introduces a fast, accurate linear predictor-based method for imputing allele frequencies from summary or pooled genotype data, addressing privacy and data collection limitations in genetic studies.
Contribution
The paper presents a novel regularized linear predictor approach for allele frequency imputation that works with summary or pooled data, improving accuracy and computational efficiency.
Findings
Imputation accuracy is comparable to state-of-the-art methods using individual data.
The method is fast, flexible, and suitable for privacy-sensitive or pooled data scenarios.
Regularization of covariance estimates enhances prediction stability.
Abstract
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas, in practice, it is often the case that only summary data are available. For example, this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
