Black-Box Statistical Prediction of Lossy Compression Ratios for Scientific Data
Robert Underwood, Julie Bessac, David Krasowska, Jon C. Calhoun, Sheng, Di, Franck Cappello

TL;DR
This paper introduces a statistical framework to predict lossy compression ratios for scientific data, enabling efficient selection and tuning of compressors without extensive trial-and-error.
Contribution
The authors develop a data-driven, compressor-agnostic prediction method that accurately estimates compression ratios using spatial correlations and entropy measures.
Findings
Median prediction error less than 12% across datasets.
Achieves at least 8.8x speedup in compression ratio search.
Outperforms existing methods in prediction accuracy.
Abstract
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
