Parallel Implementation of Lossy Data Compression for Temporal Data Sets
Zheng Yuan, William Hendrix, Seung Woo Son, Christoph Federrath, Ankit, Agrawal, Wei-keng Liao, Alok Choudhary

TL;DR
This paper introduces a parallel implementation of NUMARCK, a lossy compression algorithm for large temporal datasets, demonstrating significant speedups and higher compression ratios compared to existing methods.
Contribution
The paper presents a scalable parallel implementation of NUMARCK, improving compression efficiency and speed for large temporal datasets from scientific simulations.
Findings
Achieved up to 8788x speedup with 12800 MPI processes.
Outperformed ISABELA and ZFP in compression ratio.
Validated on climate and astrophysics datasets.
Abstract
Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
