Learning Probabilities of Causation with Mask-Augmented Data

Shuai Wang; Yizhou Sun; Judea Pearl; Ang Li

arXiv:2505.17133·stat.ML·February 11, 2026

Learning Probabilities of Causation with Mask-Augmented Data

Shuai Wang, Yizhou Sun, Judea Pearl, Ang Li

PDF

TL;DR

This paper introduces machine learning models that predict probabilities of causation across subpopulations using limited reliable data, significantly improving estimation accuracy over traditional methods.

Contribution

The paper presents two novel models, Exact-MLP and Mask-MLP, capable of estimating causation probabilities from small reliable datasets, enhancing practical applicability.

Findings

01

Models achieve ~0.03 MAE on main tasks

02

Reduce MAE by about 80% compared to baselines

03

Validated across four structural causal models

Abstract

Probabilities of causation play a central role in modern decision making. Tian and Pearl first introduced formal definitions and derived tight bounds for three binary probabilities of causation, such as the probability of necessity and sufficiency (PNS). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unreliable or impractical to obtain from limited population-level data. To solve this problem, we propose two machine learning models: Exact-MLP and Mask-MLP, which are trained on a small set of reliable subpopulations and are able to predict PNS bounds for all other subpopulations. We validate our models across four Structural Causal Models (SCMs), each evaluated on population-level data with sample sizes between 100k and 200k. Our models achieve average mean absolute errors (MAEs) of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.