Are you doing better than random guessing? A call for using negative controls when evaluating causal discovery algorithms
Anne Helby Petersen

TL;DR
This paper advocates for using negative controls as a baseline in evaluating causal discovery algorithms, providing exact distributional results and tests to improve comparison standards across studies.
Contribution
It introduces a framework for incorporating negative controls in causal discovery evaluation, including exact distributional results, tests, and a general pipeline for broader application.
Findings
Random guessing can yield high evaluation metric values in certain scenarios.
Negative controls provide a baseline to assess causal discovery performance.
A new exact test for skeleton fit improves evaluation accuracy.
Abstract
New proposals for causal discovery algorithms are typically evaluated using simulations and a few selected real data examples with known data generating mechanisms. However, there does not exist a general guideline for how such evaluation studies should be designed, and therefore, comparing results across different studies can be difficult. In this article, we propose to use negative controls as a common evaluation baseline by posing the question: Are we doing better than random guessing? For the task of graph skeleton estimation, we derive exact distributional results under random guessing for the expected behavior of a range of typical causal discovery evaluation metrics, including precision and recall. We show that these metrics can achieve very favorable values under random guessing in certain scenarios, and hence warn against using them without also reporting negative control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
