Do Contemporary Causal Inference Models Capture Real-World   Heterogeneity? Findings from a Large-Scale Benchmark

Haining Yu; Yizhou Sun

arXiv:2410.07021·stat.ML·February 21, 2025

Do Contemporary Causal Inference Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark

Haining Yu, Yizhou Sun

PDF

Open Access

TL;DR

This large-scale benchmark reveals that most modern CATE models perform worse than trivial predictors on real-world datasets, highlighting significant challenges and the need for methodological improvements in capturing heterogeneity.

Contribution

The paper introduces a novel observational sampling method and new statistical metrics to evaluate CATE models on real-world data, uncovering their limited effectiveness.

Findings

01

62% of CATE estimates have higher MSE than zero-effect predictor

02

80% of datasets with useful CATE estimates still outperform constant-effect models

03

Orthogonality-based models outperform others only 30% of the time

Abstract

We present unexpected findings from a large-scale benchmark study evaluating Conditional Average Treatment Effect (CATE) estimation algorithms, i.e., CATE models. By running 16 modern CATE models on 12 datasets and 43,200 sampled variants generated through diverse observational sampling strategies, we find that: (a) 62\% of CATE estimates have a higher Mean Squared Error (MSE) than a trivial zero-effect predictor, rendering them ineffective; (b) in datasets with at least one useful CATE estimate, 80\% still have higher MSE than a constant-effect model; and (c) Orthogonality-based models outperform other models only 30\% of the time, despite widespread optimism about their performance. These findings highlight significant challenges in current CATE models and underscore the need for broader evaluation and methodological improvements. Our findings stem from a novel application of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques