ROIBIN-SZ: Fast and Science-Preserving Compression for Serial Crystallography
Robert Underwood, Chun Yoon, Ali Gok, Sheng Di, Franck Cappello

TL;DR
ROIBIN-SZ is a novel, parallel lossy compression method that selectively preserves critical regions in crystallography data, achieving high compression ratios while maintaining scientific integrity for large-scale protein structure analysis.
Contribution
The paper introduces ROIBIN-SZ, a new parallel compression scheme that combines region-of-interest preservation with lossy compression, significantly improving speed and quality over prior methods.
Findings
Achieves up to 196x compression ratio on lysozyme data
Preserves data quality for accurate structure reconstruction
Operates efficiently at scales suitable for next-generation light sources
Abstract
Crystallography is the leading technique to study atomic structures of proteins and produces enormous volumes of information that can place strains on the storage and data transfer capabilities of synchrotron and free-electron laser light sources. Lossy compression has been identified as a possible means to cope with the growing data volumes; however, prior approaches have not produced sufficient quality at a sufficient rate to meet scientific needs. This paper presents Region Of Interest BINning with SZ lossy compression (ROIBIN-SZ) a novel, parallel, and accelerated compression scheme that separates the dynamically selected preservation of key regions with lossy compression of background information. We perform and present an extensive evaluation of the performance and quality results made by the co-design of this compression scheme. We can achieve up to a 196x and 46.44x compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Genomics and Phylogenetic Studies · Advanced Data Storage Technologies
