Spatially Robust Inference with Predicted and Missing at Random Labels
Stephen Salerno, Zhenke Wu, Tyler McCormick

TL;DR
This paper develops a new statistical inference method for spatial data with missing labels, addressing dependence and MAR issues to produce valid confidence intervals in complex real-world scenarios.
Contribution
It introduces a doubly robust estimator with a novel jackknife HAC variance correction for spatially dependent, MAR-labeled data, improving inference accuracy.
Findings
Significantly improves finite-sample calibration under MAR and spatial dependence.
Provides asymptotically valid confidence intervals in complex spatial settings.
Demonstrates superior performance over existing methods in simulations and benchmarks.
Abstract
When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Statistical Methods and Inference · Soil Geostatistics and Mapping
