# Comparing Causal Bayesian Networks Estimated from Data

**Authors:** Sisi Ma, Roshan Tourani

PMC · DOI: 10.3390/e26030228 · 2024-03-02

## TL;DR

This paper introduces new methods to compare causal networks across different systems, improving accuracy when data quality varies.

## Contribution

The novel contribution is introducing bootstrap and equal sample size resampling methods to better compare causal networks.

## Key findings

- Bootstrap and resampling methods outperformed the naive approach in simulated experiments.
- The new methods showed improved performance on real-world biomedical datasets.
- Performance varied with network structures and sample sizes.

## Abstract

The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For example, discovering common molecular mechanisms among different diseases can lead to drug repurposing. The problem of comparing causal mechanisms among multiple systems is non-trivial, since the causal mechanisms are usually unknown and need to be estimated from data. If we estimate the causal mechanisms from data generated from different systems and directly compare them (the naive method), the result can be sub-optimal. This is especially true if the data generated by the different systems differ substantially with respect to their sample sizes. In this case, the quality of the estimated causal mechanisms for the different systems will differ, which can in turn affect the accuracy of the estimated similarities and differences among the systems via the naive method. To mitigate this problem, we introduced the bootstrap estimation and the equal sample size resampling estimation method for estimating the difference between causal networks. Both of these methods use resampling to assess the confidence of the estimation. We compared these methods with the naive method in a set of systematically simulated experimental conditions with a variety of network structures and sample sizes, and using different performance metrics. We also evaluated these methods on various real-world biomedical datasets covering a wide range of data designs.

## Full-text entities

- **Genes:** PC (pyruvate carboxylase) [NCBI Gene 5091] {aka PCB}
- **Diseases:** cancer (MESH:D009369), injury to people or property (MESH:C000719191), Parkinson's disease (MESH:D010300)
- **Chemicals:** FGES (-)

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10969691/full.md

---
Source: https://tomesphere.com/paper/PMC10969691