Simulation and evaluation of cloud storage caching for data intensive science
Tobias Wegner, Mario Lassnig, Peer Ueberholz, Christian Zeitnitz

TL;DR
This paper evaluates using cloud storage as a flexible cache in scientific workflows, demonstrating it can reduce on-premises storage needs without sacrificing throughput, through a simulation-based analysis.
Contribution
It introduces a simulation model to evaluate cloud storage caching for data-intensive science workflows, providing insights into storage optimization and decision-making.
Findings
Cloud storage can reduce on-premises disk requirements.
Maintains throughput comparable to traditional storage setups.
Simulation aids in evaluating storage and network resource strategies.
Abstract
A common task in scientific computing is the derivation of data. This workflow extracts the most important information from large input data and stores it in smaller derived data objects. The derived data objects can then be used for further analysis tasks. Typically, those workflows use distributed storage and computing resources. A straightforward configuration of storage media would be low cost tape storage and higher cost disk storage. The large, infrequently accessed input data is stored on tape storage. The smaller, frequently accessed derived data is stored on disk storage. In a best case scenario, the large input data is only accessed very infrequently and in a well planned pattern. However, practice shows that often the data has to be processed continuously and unpredictably. This can significantly reduce tape storage performance. A common approach to counter this is storing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
