Learning Probabilities of Causation from Finite Population Data
Ang Li, Song Jiang, Yizhou Sun, Judea Pearl

TL;DR
This paper introduces a machine learning approach to estimate bounds on probabilities of causation for subpopulations using finite population data, overcoming practical estimation challenges.
Contribution
A novel machine learning model is proposed to learn bounds on causation probabilities for subpopulations from limited finite population data.
Findings
Successfully learned bounds for 32768 subpopulations.
Achieved accurate bounds estimation with only partial data from the population.
Demonstrated practical applicability through simulated study.
Abstract
This paper deals with the problem of learning the probabilities of causation of subpopulations given finite population data. The tight bounds of three basic probabilities of causation, the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN), were derived by Tian and Pearl. However, obtaining the bounds for each subpopulation requires experimental and observational distributions of each subpopulation, which is usually impractical to estimate given finite population data. We propose a machine learning model that helps to learn the bounds of the probabilities of causation for subpopulations given finite population data. We further show by a simulated study that the machine learning model is able to learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500 of them from the finite population data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · Census and Population Estimation
