Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets
Dingewn Tao, Sheng Di, Zizhong Chen, Franck Cappello

TL;DR
This paper introduces a pattern-matching technique to enhance lossy compression of cosmology simulation data, significantly improving prediction accuracy and reducing data size compared to existing methods.
Contribution
The paper proposes a novel pattern-matching approach to optimize lossy compression for large-scale cosmology simulation data, addressing low spatial coherence challenges.
Findings
Improved prediction accuracy over SZ compressor
Reduced compressed size of quantization codes
Effective for data with low spatial coherence
Abstract
Because of the vast volume of data being produced by today's scientific simulations, lossy compression allowing user-controlled information loss can significantly reduce the data size and the I/O burden. However, for large-scale cosmology simulation, such as the Hardware/Hybrid Accelerated Cosmology Code (HACC), where memory overhead constraints restrict compression to only one snapshot at a time, the lossy compression ratio is extremely limited because of the fairly low spatial coherence and high irregularity of the data. In this work, we propose a pattern-matching (similarity searching) technique to optimize the prediction accuracy and compression ratio of SZ lossy compressor on the HACC data sets. We evaluate our proposed method with different configurations and compare it with state-of-the-art lossy compressors. Experiments show that our proposed optimization approach can improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Chaos-based Image/Signal Encryption
