Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study
Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun

TL;DR
This paper evaluates statistical and deep learning methods for large spatial datasets, ultimately favoring the Vecchia approximation, and demonstrates their effectiveness through a competition and real data application.
Contribution
The study introduces new R functions for Gaussian process modeling, compares multiple approaches, and applies them to large spatial datasets, winning two sub-competitions.
Findings
Vecchia approximation outperformed other methods in the competition
Developed R functions to support zero-mean Gaussian processes and uncertainty estimation
Proved the effectiveness of the proposed methods on real satellite data
Abstract
Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation-two things that are necessary for the competition, we developed additional \texttt{R} functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Air Quality Monitoring and Forecasting · Soil Geostatistics and Mapping
