On the use of cross-fitting in causal machine learning with correlated units
Salvador V. Balkus, Hasan Laith, and Nima S. Hejazi

TL;DR
This paper demonstrates that standard cross-fitting methods in causal machine learning effectively reduce bias even with correlated data, challenging the need for complex correlation-aware procedures.
Contribution
It proves that ignoring correlations in cross-fitting still removes key bias terms, simplifying causal inference in correlated data settings.
Findings
Cross-fitting eliminates bias even with correlated units.
Ignoring correlation in cross-fitting can improve estimator bias and precision.
Simulation results support the effectiveness of standard cross-fitting methods.
Abstract
In causal machine learning, the fitting and evaluation of nuisance models are often performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross fitting as if study units were independent still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators achieve the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
