DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction
Khondoker Mirazul Mumenin, Robert Underwood, Dong Dai, Jinzhen Wang, Sheng Di, Zarija Luki\'c, Franck Cappello

TL;DR
DeepCQ is a versatile deep surrogate model that accurately predicts lossy compression quality across various compressors and datasets, reducing computational costs in scientific data management.
Contribution
The paper introduces a generalizable, efficient deep surrogate framework for predicting compression quality, with a novel two-stage design and mixture-of-experts optimization for time-evolving data.
Findings
Prediction errors generally under 10%
Outperforms existing methods significantly
Effective across multiple scientific applications
Abstract
Error-bounded lossy compression techniques have become vital for scientific data management and analytics, given the ever-increasing volume of data generated by modern scientific simulations and instruments. Nevertheless, assessing data quality post-compression remains computationally expensive due to the intensive nature of metric calculations. In this work, we present a general-purpose deep-surrogate framework for lossy compression quality prediction (DeepCQ), with the following key contributions: 1) We develop a surrogate model for compression quality prediction that is generalizable to different error-bounded lossy compressors, quality metrics, and input datasets; 2) We adopt a novel two-stage design that decouples the computationally expensive feature-extraction stage from the light-weight metrics prediction, enabling efficient training and modular inference; 3) We optimize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Data Quality and Management · Advanced Data Compression Techniques
