Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets
David Krasowska, Julie Bessac, Robert Underwood, Jon C. Calhoun, Sheng, Di, and Franck Cappello

TL;DR
This paper investigates how statistical correlation structures in scientific datasets influence the effectiveness of lossy compression, aiming to predict compression limits and improve compression strategies.
Contribution
It introduces statistical methods to characterize data correlations and relate them to compression ratios, advancing understanding of lossy compressibility bounds.
Findings
Correlation structures relate to compression ratios
Statistical models can predict compression performance
Insights towards theoretical limits of lossy compression
Abstract
Lossy compression plays a growing role in scientific simulations where the cost of storing their output data can span terabytes. Using error bounded lossy compression reduces the amount of storage for each simulation; however, there is no known bound for the upper limit on lossy compressibility. Correlation structures in the data, choice of compressor and error bound are factors allowing larger compression ratios and improved quality metrics. Analyzing these three factors provides one direction towards quantifying lossy compressibility. As a first step, we explore statistical methods to characterize the correlation structures present in the data and their relationships, through functional models, to compression ratios. We observed a relationship between compression ratios and statistics summarizing correlation structure of the data, which are a first step towards evaluating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Clustering Algorithms Research
