Neighborhood Stability in Double/Debiased Machine Learning with Dependent Data
Jianfei Cao, Michael P. Leung

TL;DR
This paper extends double/debiased machine learning methods to dependent data in metric spaces, introducing neighborhood stability to ensure validity without cross-fitting, especially useful for spatial and network data.
Contribution
It introduces neighborhood stability as a new condition for DML with dependent data, allowing valid inference without cross-fitting in spatial and network contexts.
Findings
Neighborhood stability can be verified for common machine learners.
DML methods remain valid under dependent data with neighborhood stability.
Simulation results highlight issues with small training folds in network data.
Abstract
This paper studies double/debiased machine learning (DML) methods applied to weakly dependent data. We allow observations to be situated in a general metric space that accommodates spatial and network data. Existing work implements cross-fitting by excluding from the training fold observations sufficiently close to the evaluation fold. We find in simulations that this can result in exceedingly small training fold sizes, particularly with network data. We therefore seek to establish the validity of DML without cross-fitting, building on recent work by Chen et al. (2022). They study i.i.d. data and require the machine learner to satisfy a natural stability condition requiring insensitivity to data perturbations that resample a single observation. We extend these results to dependent data by strengthening stability to "neighborhood stability," which requires insensitivity to resampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
