Northlight: Declarative and Optimized Analysis of Atmospheric Datasets in SparkSQL
Justus Henneberg, Felix Schuhknecht, Philipp Reutter, Nils, Brast, Peter Spichtinger

TL;DR
Northlight integrates NetCDF atmospheric data processing into SparkSQL, offering automatic optimizations that significantly improve performance and scalability for complex Earth science analytics.
Contribution
It introduces Northlight, a system that combines NetCDF processing with SparkSQL, featuring automatic optimizations tailored for atmospheric datasets.
Findings
Outperforms state-of-the-art pipelines by up to 6x in speed.
Scales gracefully with analysis task selectivity.
Provides an easy-to-use, optimized platform for Earth science data analysis.
Abstract
Performing data-intensive analytics is an essential part of modern Earth science. As such, research in atmospheric physics and meteorology frequently requires the processing of very large observational and/or modeled datasets. Typically, these datasets (a) have high dimensionality, i.e. contain various measurements per spatiotemporal point, (b) are extremely large, containing observations over a long time period. Additionally, (c) the analytical tasks being performed on these datasets are structurally complex. Over the years, the binary format NetCDF has been established as a de-facto standard in distributing and exchanging such multi-dimensional datasets in the Earth science community -- along with tools and APIs to visualize, process, and generate them. Unfortunately, these access methods typically lack either (1) an easy-to-use but rich query interface or (2) an automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Management and Algorithms
