# Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study

**Authors:** Richard Howey, Jonathan Adam, Jerzy Adamski, Natalie N. Atabaki, Søren Brunak, Piotr Jaroslaw Chmura, Federico De Masi, Emmanouil T. Dermitzakis, Juan J. Fernandez-Tajes, Ian M. Forgie, Paul W. Franks, Giuseppe N. Giordano, Mark Haid, Torben Hansen, Tue H. Hansen, Peter P. Harms, Andrew T. Hattersley, Mun-gwan Hong, Ulrik Plesner Jacobsen, Angus G. Jones, Robert W. Koivula, Tarja Kokkola, Anubha Mahajan, Andrea Mari, Mark I. McCarthy, Timothy J. McDonald, Petra B. Musholt, Imre Pavo, Ewan R. Pearson, Oluf Pedersen, Hartmut Ruetten, Femke Rutters, Jochen M. Schwenk, Sapna Sharma, Leen M. ’t Hart, Henrik Vestergaard, Mark Walker, Ana Viñuela, Heather J. Cordell

PMC · DOI: 10.1371/journal.pgen.1011776 · PLOS Genetics · 2025-07-15

## TL;DR

This study uses Bayesian networks to explore causal relationships in incomplete type 2 diabetes data from 3029 individuals, confirming known and new associations.

## Contribution

A novel Bayesian network imputation method enables analysis of incomplete multi-omics data to infer causal relationships.

## Key findings

- Confirmed many previous findings on type 2 diabetes including new mediating proteins and genes.
- Identified potential causal relationships with liver fat using a larger dataset than prior studies.
- Demonstrated the utility of BayesNetty's imputation method for analyzing incomplete multi-omics data.

## Abstract

Here we report the results from exploratory analysis using a Bayesian network approach of data originally derived from a large North European study of type 2 diabetes (T2D) conducted by the IMI DIRECT consortium. 3029 individuals (795 with T2D and 2234 without) within 7 different study centres provided data comprising genotypes, proteins, metabolites, gene expression measurements and many different clinical variables. The main aim of the current study was to demonstrate the utility of our previously developed method to fit Bayesian networks by performing exploratory analysis of this dataset to identify possible causal relationships between these variables. The data was analysed using the BayesNetty software package, which can handle mixed discrete/continuous data with missing values. The original dataset consisted of over 16,000 variables, which were filtered down to 260 variables for analysis. Even with this reduction, no individual had complete data for all variables, making it impossible to analyse using standard Bayesian network methodology. However, using the recently proposed novel imputation method implemented in BayesNetty we computed a large average Bayesian network from which we could infer possible associations and causal relationships between variables of interest. Our results confirmed many previous findings in connection with T2D, including possible mediating proteins and genes, some of which have not been widely reported. We also confirmed potential causal relationships with liver fat that were identified in an earlier study that used the IMI DIRECT dataset but was limited to a smaller subset of individuals and variables (namely individuals with complete data at pre-defined variables of interest). In addition to providing valuable confirmation, our analyses thus demonstrate a proof-of-principle of the utility of the method implemented within BayesNetty. The full final average Bayesian network generated from our analysis is freely available and can be easily interrogated further to address specific focussed scientific questions of interest.

Bayesian network analysis can be used to identify putative causal relationships between measured variables, including clinical measurements and measurements of genetic and genomic factors. Here we report the results from Bayesian network analysis of data originally derived from a large North European study of type 2 diabetes (T2D). Data were collected for 3029 individuals within 7 different study centres, with the data comprising genotypes, proteins, metabolites, gene expression measurements and many different clinical variables. The original dataset consisted of over 16,000 variables, which were then filtered down to 260 variables for analysis. Even with this reduction, not one individual had complete data for all variables. Using standard methodology it would not be possible to a fit a Bayesian network as it requires complete data. However, using the novel imputation method implemented in the BayesNetty software package, we were able to compute a large average Bayesian network from which we could infer possible associations and causal relationships between variables of interest.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Diseases:** T2D (MESH:D003924)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12279144/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12279144/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12279144/full.md

---
Source: https://tomesphere.com/paper/PMC12279144