A General Framework for Error-controlled Unstructured Scientific Data   Compression

Qian Gong; Zhe Wang; Viktor Reshniak; Xin Liang; Jieyang Chen; Qing; Liu; Tushar M. Athawale; Yi Ju; Anand Rangarajan; Sanjay Ranka; Norbert; Podhorszki; Rick Archibald; Scott Klasky

arXiv:2501.06910·cs.IT·January 14, 2025

A General Framework for Error-controlled Unstructured Scientific Data Compression

Qian Gong, Zhe Wang, Viktor Reshniak, Xin Liang, Jieyang Chen, Qing, Liu, Tushar M. Athawale, Yi Ju, Anand Rangarajan, Sanjay Ranka, Norbert, Podhorszki, Rick Archibald, Scott Klasky

PDF

TL;DR

This paper introduces a versatile, error-bounded compression framework for unstructured scientific mesh data that significantly improves compression ratios by interpolating data onto rectilinear grids and compressing residuals.

Contribution

It presents a novel, general multi-component compression method that enhances lossy compression of unstructured mesh data, independent of mesh types and compatible with existing compressors.

Findings

01

Achieves 2.3-3.5x better compression ratios than state-of-the-art methods

02

Works effectively across synthetic and real-world datasets

03

Provides insights into hyperparameter tuning for optimal compression

Abstract

Data compression plays a key role in reducing storage and I/O costs. Traditional lossy methods primarily target data on rectilinear grids and cannot leverage the spatial coherence in unstructured mesh data, leading to suboptimal compression ratios. We present a multi-component, error-bounded compression framework designed to enhance the compression of floating-point unstructured mesh data, which is common in scientific applications. Our approach involves interpolating mesh data onto a rectilinear grid and then separately compressing the grid interpolation and the interpolation residuals. This method is general, independent of mesh types and typologies, and can be seamlessly integrated with existing lossy compressors for improved performance. We evaluated our framework across twelve variables from two synthetic datasets and two real-world simulation datasets. The results indicate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.