# A Non-Gaussian, Nonparametric Structure for Gene-Gene and   Gene-Environment Interactions in Case-Control Studies Based on Hierarchies of   Dirichlet Processes

**Authors:** Durba Bhattacharya, Sourabh Bhattacharya

arXiv: 1704.07349 · 2020-05-04

## TL;DR

This paper introduces a novel nonparametric Bayesian model using hierarchies of Dirichlet processes to better capture complex gene-gene and gene-environment interactions in case-control studies, improving analysis of genetic data.

## Contribution

It develops a new nonparametric Bayesian framework with an efficient MCMC algorithm for modeling intricate dependence structures in genetic data, surpassing previous models.

## Key findings

- Effective detection of gene-environment interactions
- Successful application to simulated datasets
- Insights into gene-gene interactions in myocardial infarction

## Abstract

It is becoming increasingly clear that complex interactions among genes and environmental factors play crucial roles in triggering complex diseases. Thus, understanding such interactions is vital, which is possible only through statistical models that adequately account for such intricate, albeit unknown, dependence structures. Bhattacharya & Bhattacharya (2016b) attempt such modeling, relating finite mixtures composed of Dirichlet processes that represent unknown number of genetic sub-populations through a hierarchical matrix-normal structure that incorporates gene-gene interactions, and possible mutations, induced by environmental variables. However, the product dependence structure implied by their matrix-normal model seems to be too simple to be appropriate for general complex, realistic situations. In this article, we propose and develop a novel nonparametric Bayesian model for case-control genotype data using hierarchies of Dirichlet processes that offers a more realistic and nonparametric dependence structure between the genes, induced by the environmental variables. In this regard, we propose a novel and highly parallelisable MCMC algorithm that is rendered quite efficient by the combination of modern parallel computing technology, effective Gibbs sampling steps, retrospective sampling and Transformation based Markov Chain Monte Carlo (TMCMC). We use appropriate Bayesian hypothesis testing procedures to detect the roles of genes and environment in case-control studies. We apply our ideas to 5 biologically realistic case-control genotype datasets simulated under distinct set-ups, and obtain encouraging results in each case. We finally apply our ideas to a real, myocardial infarction dataset, and obtain interesting results on gene-gene and gene-environment interaction, while broadly agreeing with the results reported in the literature.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.07349/full.md

## Figures

61 figures with captions in the complete paper: https://tomesphere.com/paper/1704.07349/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1704.07349/full.md

---
Source: https://tomesphere.com/paper/1704.07349