Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling
Sian Jin, Sheng Di, Jiannan Tian, Suren Byna, Dingwen Tao, Franck, Cappello

TL;DR
This paper introduces an analytical ratio-quality model for error-bounded lossy compression that accurately predicts data quality and compression ratio, significantly reducing the need for trial-and-error tuning in scientific data reduction.
Contribution
The paper presents a novel analytical model that improves lossy compression configuration by accurately predicting outcomes, enabling efficient optimization without extensive testing.
Findings
Achieves 93.47% prediction accuracy on average
Reduces computational cost by up to 18.7X
Speeds up data storage by up to 3.4X
Abstract
Error-bounded lossy compression is one of the most effective techniques for scientific data reduction. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of the lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained error-bound tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Scientific Computing and Data Management
