Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference
Harry Amad, Zhaozhi Qian, Dennis Frauen, Julianna Piskorz, Stefan Feuerriegel, Mihaela van der Schaar

TL;DR
This paper introduces STEAM, a new generative model designed to produce synthetic medical data optimized for causal inference tasks, addressing the unique challenges of treatment-related data and improving downstream analysis.
Contribution
The paper proposes a set of desiderata and evaluation metrics for synthetic treatment data and introduces STEAM, a novel generative method that outperforms existing models in complex scenarios.
Findings
STEAM achieves state-of-the-art performance on evaluation metrics.
It maintains covariate, treatment, and outcome distributions effectively.
Performance improves with increasing data complexity.
Abstract
Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables these medical analyses, along with the development of new inference methods themselves. Generative models can produce synthetic data that closely approximate real data distributions, yet existing methods do not consider the unique challenges that downstream causal inference tasks, and specifically those focused on treatments, pose. We establish a set of desiderata that synthetic data containing treatments should satisfy to maximise downstream utility: preservation of (i) the covariate distribution, (ii) the treatment assignment mechanism, and (iii) the outcome generation mechanism. Based on these desiderata, we propose a set of evaluation metrics to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)
