A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling
Eustache Diemert, Artem Betlei, Christophe Renaudin, Massih-Reza, Amini, Th\'eophane Gregoir, Thibaud Rahier

TL;DR
This paper introduces a large-scale, publicly available benchmark dataset with 13.9 million samples for advancing individual treatment effect prediction and uplift modeling, enabling more robust evaluation and comparison of causal inference methods.
Contribution
It provides the largest dataset to date for ITE prediction, formalizes uplift modeling tasks, and offers baseline evaluations to facilitate future research.
Findings
Dataset contains 13.9 million samples from RCTs
Baseline methods show significant performance differences
Validation confirms dataset's suitability for causal inference
Abstract
Individual Treatment Effect (ITE) prediction is an important area of research in machine learning which aims at explaining and estimating the causal impact of an action at the granular level. It represents a problem of growing interest in multiple sectors of application such as healthcare, online advertising or socioeconomics. To foster research on this topic we release a publicly available collection of 13.9 million samples collected from several randomized control trials, scaling up previously available datasets by a healthy 210x factor. We provide details on the data collection and perform sanity checks to validate the use of this data for causal inference tasks. First, we formalize the task of uplift modeling (UM) that can be performed with this data, along with the relevant evaluation metrics. Then, we propose synthetic response surfaces and heterogeneous treatment assignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
