Bayesian semiparametric analysis for two-phase studies of gene-environment interaction
Jaeil Ahn, Bhramar Mukherjee, Stephen B. Gruber, Malay Ghosh

TL;DR
This paper develops a Bayesian semiparametric framework for analyzing gene-environment interactions in two-phase studies, addressing high-dimensional data, variable selection, and leveraging independence assumptions to improve estimation efficiency.
Contribution
It introduces a Bayesian approach combining variable selection, hierarchical priors, and nonparametric modeling for complex gene-environment interaction analysis in two-phase studies.
Findings
Effective variable selection for high-dimensional models.
Improved estimation of interaction parameters using independence assumptions.
Flexible modeling of joint distributions with nonparametric Bayes methods.
Abstract
The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
