Preserving Clusters in Error-Bounded Lossy Compression of Particle Data
Congrong Ren, Sheng Di, Katrin Heitmann, Franck Cappello, Hanqi Guo

TL;DR
This paper introduces a correction technique for lossy particle data compression that preserves clustering structures crucial for scientific analysis, ensuring data integrity without sacrificing compression efficiency.
Contribution
A novel clustering-aware correction method that guarantees preservation of single-linkage clustering in lossy compressed particle datasets, scalable to large data.
Findings
Effectively preserves clustering results in cosmology and molecular dynamics datasets.
Maintains competitive compression ratios compared to existing schemes.
Scalable GPU-accelerated implementation for large-scale data.
Abstract
Lossy compression is widely used to reduce storage and I/O costs for large-scale particle datasets in scientific applications such as cosmology, molecular dynamics, and fluid dynamics, where clustering structures (e.g., single-linkage or Friends-of-Friends) are critical for downstream analysis; however, existing compressors typically provide only pointwise error bounds on particle positions and offer no guarantees on preserving clustering outcomes, and even small perturbations can alter cluster connectivity and compromise scientific validity. We propose a correction-based technique to preserve single-linkage clustering under lossy compression, operating on decompressed data from off-the-shelf compressors such as SZ3 and Draco. Our key contributions are threefold: (1) a clustering-aware correction algorithm that identifies vulnerable particle pairs via spatial partitioning and local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
