UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data
Yuanke Qu, Xiaoya Xu, Hengtao Zhang

TL;DR
UD-DML introduces a geometry-aware subsampling method for double machine learning that reduces computational costs while maintaining statistical validity, especially effective in low-overlap scenarios.
Contribution
It proposes a novel design-based subsampling strategy using low-discrepancy skeletons in PCA space for efficient ATE estimation with theoretical guarantees.
Findings
Lower RMSE compared to uniform subsampling.
Narrower confidence intervals and more reliable coverage.
Effective in low-overlap and misspecified regimes.
Abstract
Double machine learning (DML) delivers valid inference on low-dimensional causal parameters while permitting flexible nuisance estimation, but its computational cost becomes prohibitive once cross-fitted learners must be trained on massive observational data. Applying DML to a uniformly drawn subsample alleviates this burden, yet such a reduction disregards the geometry of the covariate space and can exacerbate treated-control imbalance as well as overlap deficiency. We propose Uniform Design Double Machine Learning (UD-DML), a design-based subsampling strategy for average treatment effect (ATE) estimation. UD-DML first constructs a low-discrepancy skeleton in a PCA-rotated covariate space under the mixture-discrepancy criterion, and then assigns, to each skeleton point, the nearest treated and control units via KD-tree search. The resulting matched subsample is, by construction, both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
