FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313
Aaron Ge, Jeya Balasubramanian, Xueyao Wu, Peter Kraft, and Jonas S., Almeida

TL;DR
This paper introduces FastImpute, a lightweight, reference-free genotype imputation method that enables client-side processing, improving privacy and accessibility for polygenic risk score calculations, demonstrated with breast cancer risk prediction.
Contribution
The study presents a simple linear regression-based imputation pipeline that is generalizable, efficient, and suitable for client-side deployment, addressing limitations of existing deep learning models.
Findings
Linear regression improved PRS313 score accuracy with R^2 of 0.86.
Imputation outperformed simple methods, increasing predictive power.
Model enables personalized genetic insights on consumer devices.
Abstract
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification
MethodsFocus · Linear Regression
