Data Compression in the Petascale Astronomy Era: a GERLUMPH case study
Dany Vohl, Christopher J. Fluke, Georgios Vernardos

TL;DR
This paper evaluates JPEG2000 and other compression techniques for large-scale astronomical simulation data, demonstrating that lossy JPEG2000 can significantly reduce data size without compromising analysis goals.
Contribution
It provides the first assessment of JPEG2000's effectiveness on numerical simulation data in astronomy, highlighting its potential for large, volumetric datasets.
Findings
Lossless compression ratios ranged from 1.35:1 to 4.69:1.
JPEG2000 achieved high compression ratios suitable for volumetric data.
Lossy compression with JPEG2000 did not significantly affect data analysis outcomes.
Abstract
As the volume of data grows, astronomers are increasingly faced with choices on what data to keep -- and what to throw away. Recent work evaluating the JPEG2000 (ISO/IEC 15444) standards as a future data format standard in astronomy has shown promising results on observational data. However, there is still a need to evaluate its potential on other type of astronomical data, such as from numerical simulations. GERLUMPH (the GPU-Enabled High Resolution cosmological MicroLensing parameter survey) represents an example of a data intensive project in theoretical astrophysics. In the next phase of processing, the ~27 terabyte GERLUMPH dataset is set to grow by a factor of 100 -- well beyond the current storage capabilities of the supercomputing facility on which it resides. In order to minimise bandwidth usage, file transfer time, and storage space, this work evaluates several data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
