Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects
Daniel Jacob

TL;DR
This paper evaluates how different data splitting, cross-fitting, and averaging strategies impact the accuracy of machine learning methods in estimating heterogeneous treatment effects, highlighting optimal procedures for improved performance.
Contribution
It systematically compares twelve estimators across various meta-learners to identify effective data splitting and averaging techniques for treatment effect estimation.
Findings
Cross-fitting plus median averaging yields best MSE performance.
Excluding Lasso reduces variance and enhances robustness.
Performance heavily depends on the data splitting and averaging procedures.
Abstract
We investigate the finite sample performance of sample splitting, cross-fitting and averaging for the estimation of the conditional average treatment effect. Recently proposed methods, so-called meta-learners, make use of machine learning to estimate different nuisance functions and hence allow for fewer restrictions on the underlying structure of the data. To limit a potential overfitting bias that may result when using machine learning methods, cross-fitting estimators have been proposed. This includes the splitting of the data in different folds to reduce bias and averaging over folds to restore efficiency. To the best of our knowledge, it is not yet clear how exactly the data should be split and averaged. We employ a Monte Carlo study with different data generation processes and consider twelve different estimators that vary in sample-splitting, cross-fitting and averaging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials
