Parallel Computation of PDFs on Big Spatial Data Using Spark
Ji Liu, Noel Moreno Lemus, Esther Pacitti, Fabio Porto and, Patrick Valduriez

TL;DR
This paper presents a scalable Spark-based method for efficiently computing probability density functions on large 3D spatial datasets, significantly reducing computation time for uncertainty analysis in geological and seismic data.
Contribution
It introduces a novel parallel approach using data grouping, machine learning prediction, and sampling to compute PDFs on big spatial data efficiently.
Findings
Achieves up to 33x speedup over baseline methods.
Scales effectively on datasets from hundreds of GB to several TB.
Reduces computation time from hours or months to seconds or minutes.
Abstract
We consider big spatial data, which is typically produced in scientific areas such as geological or seismic interpretation. The spatial data can be produced by observation (e.g. using sensors or soil instrument) or numerical simulation programs and correspond to points that represent a 3D soil cube area. However, errors in signal processing and modeling create some uncertainty, and thus a lack of accuracy in identifying geological or seismic phenomenons. Such uncertainty must be carefully analyzed. To analyze uncertainty, the main solution is to compute a Probability Density Function (PDF) of each point in the spatial cube area. However, computing PDFs on big spatial data can be very time consuming (from several hours to even months on a parallel computer). In this paper, we propose a new solution to efficiently compute such PDFs in parallel using Spark, with three methods: data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Soil Geostatistics and Mapping · Distributed and Parallel Computing Systems
