UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data

Yuanke Qu; Xiaoya Xu; Hengtao Zhang

arXiv:2605.05772·stat.ME·May 8, 2026

UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data

Yuanke Qu, Xiaoya Xu, Hengtao Zhang

PDF

TL;DR

UD-DML introduces a geometry-aware subsampling method for double machine learning that reduces computational costs while maintaining statistical validity, especially effective in low-overlap scenarios.

Contribution

It proposes a novel design-based subsampling strategy using low-discrepancy skeletons in PCA space for efficient ATE estimation with theoretical guarantees.

Findings

01

Lower RMSE compared to uniform subsampling.

02

Narrower confidence intervals and more reliable coverage.

03

Effective in low-overlap and misspecified regimes.

Abstract

Double machine learning (DML) delivers valid inference on low-dimensional causal parameters while permitting flexible nuisance estimation, but its computational cost becomes prohibitive once cross-fitted learners must be trained on massive observational data. Applying DML to a uniformly drawn subsample alleviates this burden, yet such a reduction disregards the geometry of the covariate space and can exacerbate treated-control imbalance as well as overlap deficiency. We propose Uniform Design Double Machine Learning (UD-DML), a design-based subsampling strategy for average treatment effect (ATE) estimation. UD-DML first constructs a low-discrepancy skeleton in a PCA-rotated covariate space under the mixture-discrepancy criterion, and then assigns, to each skeleton point, the nearest treated and control units via KD-tree search. The resulting matched subsample is, by construction, both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.