A Data-Informed Local Subspaces Method for Error-Bounded Lossy Compression of Large-Scale Scientific Datasets
Arshan Khan, Rohit Deshmukh, Ben O'Neill

TL;DR
This paper introduces Discontinuous Data-informed Local Subspaces, a data-driven error-bounded lossy compression method that significantly improves storage efficiency for large-scale scientific datasets by leveraging localized data structures.
Contribution
The paper presents a novel data-informed local subspaces approach that enhances compression ratios over data-agnostic methods for scientific data, applicable across various high-dimensional datasets.
Findings
Achieves higher compression ratios while maintaining data fidelity.
Effectively preserves key features in fluid dynamics and environmental datasets.
Demonstrates scalability in distributed computing environments.
Abstract
The growing volume of scientific simulation data presents a significant challenge for storage and transfer. Error-bounded lossy compression has emerged as a critical solution for mitigating these challenges, providing a means to reduce data size while ensuring that reconstructed data remains valid for scientific analysis. In this paper, we present a data-driven scientific data compressor, called Discontinuous Data-informed Local Subspaces (Discontinuous DLS), to improve compression-to-error ratios over data-agnostic compressors. This error-bounded compressor leverages localized spatial and temporal subspaces, informed by the underlying data structure, to enhance compression efficiency and preserve key features. The presented technique is flexible and applicable to a wide range of scientific data, including fluid dynamics, environmental simulations, and other high-dimensional,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Computer Graphics and Visualization Techniques · Parallel Computing and Optimization Techniques
