Using Genetic Distance to Infer the Accuracy of Genomic Prediction
Marco Scutari, Ian Mackay, David Balding

TL;DR
This paper investigates how genetic distance affects the accuracy of genomic predictions, revealing a roughly linear decay in prediction correlation as genetic divergence increases, with implications for breeding and medical applications.
Contribution
It introduces a clustering and resampling approach to quantify how predictive accuracy declines with genetic distance, aiding in optimal training population selection.
Findings
Prediction accuracy decays linearly with genetic distance measures.
Simulation and real data confirm the linear decay relationship.
Guidelines for recalibrating models based on genetic divergence.
Abstract
The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
