TL;DR
This paper investigates how sample selection bias affects the evaluation of causal models' prediction performance, revealing that bias can lead to overly optimistic assessments and suggesting improved evaluation methods.
Contribution
It identifies sample selection bias as a key factor in evaluating causal models and proposes using less-biased evaluation sets for more accurate performance assessment.
Findings
Sample selection bias inflates causal model performance estimates.
Causal models perform similarly or worse than standard estimators on unbiased sets.
Simulations without bias show different performance patterns, informing future evaluations.
Abstract
Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However, prediction performance does depend on the selection of training and test sets. Biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
