A General Framework for Progressive Data Compression and Retrieval
Victor A. P. Magri, Peter Lindstrom

TL;DR
This paper introduces a versatile framework for progressive data compression that enables adaptive, incremental data retrieval with maintained accuracy, regardless of the underlying compression method, improving efficiency in scientific data handling.
Contribution
The authors propose a novel, general framework for progressive data compression that works across different compressors and data representations, supporting adaptive, incremental data retrieval.
Findings
Framework achieves high accuracy compared to standalone compressors.
(De)compression time scales with number of components.
Supports lossless compression with lossy compressors when enough components are used.
Abstract
In scientific simulations, observations, and experiments, the cost of transferring data to and from disk and across networks has become a significant bottleneck that particularly impacts subsequent data analysis and visualization. To address this challenge, compression techniques have been widely adopted. However, traditional lossy compression approaches often require setting error tolerances conservatively to respect the numerical sensitivities of a wide variety of post hoc data analyses, some of which may not even be known a priori. Progressive data compression and retrieval has emerged as a solution, allowing for the adaptive handling of compressed data according to the needs of a given post-processing task. However, few analysis algorithms natively support progressive data processing, and adapting compression techniques, file formats, client/server frameworks, and APIs to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
