DisTRaC: Accelerating High Performance Compute Processing for Temporary Data Storage
Gabryel Mason-Williams, Dave Bond, Mark Basham

TL;DR
This paper introduces DisTRaC, a system that leverages RAM disks and Ceph object storage to significantly accelerate temporary data processing in HPC environments, reducing I/O overhead and processing time.
Contribution
The paper presents a novel RAM block interacting with Ceph and a deployment tool, improving temporary data storage performance on HPC clusters.
Findings
Reduced I/O overhead by 81.04% in tomography data processing.
Decreased processing time by 8.32% using the new system.
Demonstrated effective deployment of Ceph on HPC infrastructure.
Abstract
High Performance Compute (HPC) clusters often produce intermediate files as part of code execution and message passing is not always possible to supply data to these cluster jobs. In these cases, I/O goes back to central distributed storage to allow cross node data sharing. These systems are often high performance and characterised by their high cost per TB and sensitivity to workload type such as being tuned to small or large file I/O. However, compute nodes often have large amounts of RAM, so when dealing with intermediate files where longevity or reliability of the system is not as important, local RAM disks can be used to obtain performance benefits. In this paper we show how this problem was tackled by creating a RAM block that could interact with the object storage system Ceph, as well as creating a deployment tool to deploy Ceph on HPC infrastructure effectively. This work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
