Optimizing Scientific Data Transfer on Globus with Error-bounded Lossy Compression
Yuanjian Liu, Sheng Di, Kyle Chard, Ian Foster, Franck Cappello

TL;DR
Ocelot is a novel framework that integrates error-bounded lossy compression into Globus, significantly enhancing scientific data transfer efficiency while maintaining user-defined data quality bounds.
Contribution
It introduces the first lossy compression integration in Globus, along with a machine learning model for quality prediction and optimized strategies for faster data transfer.
Findings
Substantial transfer performance improvements across scientific applications.
Accurate prediction of lossy compression quality metrics.
Effective strategies reducing compression overhead.
Abstract
The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
