Prediction and replication from case-control sequencing studies using custom genotyping and additional sequencing
C. Ryan King, Paul J. Rathouz, Dan L. Nicolae

TL;DR
This paper investigates how allele-count burdens from case-control sequencing studies behave in prediction and validation, revealing biases due to SNP selection and effects distribution, especially in rare diseases.
Contribution
It provides a theoretical explanation for the bias in allele-count associations between primary and replication studies, highlighting implications for rare disease sequencing.
Findings
Genotyping only polymorphic SNPs inflates phenotype-AC correlation in replication.
Sequencing the replication sample reveals smaller or opposite associations for novel SNPs.
Biases are amplified in heavy-tailed effect size distributions and rare disease contexts.
Abstract
We present two results about using allele-count (AC) burdens of rare SNPs discovered in a case-control sequencing study for prediction or validation in an external prospective study. When genotyping only the SNPs polymorphic in the sequence data, the phenotype to AC correlation tends to be larger in the replication data than the primary study. Conversely, if the replication sample is sequenced, ACs of SNPs which are novel in the replication tend to have much smaller or opposite signed associations. We explain this by first deriving the AC-phenotype association implied by a model of diverse SNP effects, and second accounting for the shifted distribution of SNP effects when using a case-control study as a filter for SNP inclusion. In rare diseases, the case population is depleted of protective SNPs and enriched for deleterious SNPs, creating the above difference in AC associations. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genomics and Rare Diseases · Genomic variations and chromosomal abnormalities
