Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators
Alicia Curth, Mihaela van der Schaar

TL;DR
This paper critically examines how benchmark datasets like IHDP and ACIC2016 may bias the evaluation of treatment effect estimators, emphasizing the importance of understanding underlying assumptions for fair comparison.
Contribution
It reveals the limitations of current benchmark datasets in treatment effect estimation and discusses how their characteristics influence algorithm performance evaluation.
Findings
Benchmark datasets can favor certain algorithms due to their inherent characteristics.
Current evaluations may be misleading if data-generating assumptions are not properly considered.
The paper highlights the need for more transparent and representative benchmarking practices.
Abstract
The machine learning toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly, yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we show that even in arguably the simplest setting -- estimation under ignorability assumptions -- the results of such empirical evaluations can be misleading if (i) the assumptions underlying the data-generating mechanisms in benchmark datasets and (ii) their interplay with baseline algorithms are inadequately discussed. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators -- the IHDP and ACIC2016 datasets -- in detail. We identify problems with their current use and highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life · Statistical Methods and Inference
