Large-scale Environmental Data Science with ExaGeoStatR
Sameh Abdulah, Yuxiao Li, Jian Cao, Hatem Ltaief, David E. Keyes, Marc, G. Genton, Ying Sun

TL;DR
This paper introduces ExaGeoStatR, an R package enabling parallel computation of exact Gaussian log-likelihood functions for large-scale environmental datasets, leveraging exascale computing architectures to overcome computational limitations.
Contribution
The paper presents ExaGeoStatR, a novel R package that supports scalable, parallel exact maximum likelihood estimation for large geostatistical datasets across diverse hardware architectures.
Findings
ExaGeoStatR efficiently computes likelihoods for datasets with up to 250K observations.
Performance comparisons show ExaGeoStatR outperforms existing packages geoR and fields.
The package provides tools for synthetic data simulation and real data analysis.
Abstract
Parallel computing in Gaussian process calculations becomes necessary for avoiding computational and memory restrictions associated with large-scale environmental data science applications. The evaluation of the Gaussian log-likelihood function requires O(n^2) storage and O(n^3) operations where n is the number of geographical locations. Thus, computing the log-likelihood function with a large number of locations requires exploiting the power of existing parallel computing hardware systems, such as shared-memory, possibly equipped with GPUs, and distributed-memory systems, to solve this computational complexity. In this paper, we advocate the use of ExaGeoStatR, a package for exascale Geostatistics in R that supports a parallel computation of the exact maximum likelihood function on a wide variety of parallel architectures. Parallelization in ExaGeoStatR depends on breaking down the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R
