Causal datasheet: An approximate guide to practically assess Bayesian networks in the real world
Bradley Butcher, Vincent S. Huang, Jeremy Reffin, Sema K. Sgaier,, Grace Charles, Novi Quadrianto

TL;DR
This paper introduces a Causal Datasheet framework that estimates Bayesian Network performance on datasets, aiding validation and interpretation in real-world causal analysis, demonstrated through synthetic data and a maternal health survey.
Contribution
It proposes a novel Causal Datasheet concept with a prototype tool to assess Bayesian Network reliability on specific datasets, enhancing practical causal inference validation.
Findings
Generated over 30,000 synthetic datasets for benchmarking
Automatically populated datasheets with performance expectations
Applied to maternal health survey in Uttar Pradesh
Abstract
In solving real-world problems like changing healthcare-seeking behaviors, designing interventions to improve downstream outcomes requires an understanding of the causal links within the system. Causal Bayesian Networks (BN) have been proposed as one such powerful method. In real-world applications, however, confidence in the results of BNs are often moderate at best. This is due in part to the inability to validate against some ground truth, as the DAG is not available. This is especially problematic if the learned DAG conflicts with pre-existing domain doctrine. At the policy level, one must justify insights generated by such analysis, preferably accompanying them with uncertainty estimation. Here we propose a causal extension to the datasheet concept proposed by Gebru et al (2018) to include approximate BN performance expectations for any given dataset. To generate the results for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management
