Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage
Pengfei Xuan, Jeffrey Denton, Rong Ge, Pradip K. Srimani, Feng Luo

TL;DR
This paper proposes a two-level storage system combining Tachyon and OrangeFS to improve I/O throughput on HPC clusters for data-intensive workloads, demonstrated through modeling and TeraSort benchmarking.
Contribution
It introduces a novel two-level storage architecture integrating in-memory and parallel file systems to enhance I/O performance on HPC systems.
Findings
Two-level storage increases aggregate I/O throughput.
Theoretical and experimental results confirm performance improvements.
Supports I/O intensive workloads on existing HPC resources.
Abstract
Data-intensive computing has become one of the major workloads on traditional high-performance computing (HPC) clusters. Currently, deploying data-intensive computing software framework on HPC clusters still faces performance and scalability issues. In this paper, we develop a new two-level storage system by integrating Tachyon, an in-memory file system with OrangeFS, a parallel file system. We model the I/O throughputs of four storage structures: HDFS, OrangeFS, Tachyon and two-level storage. We conduct computational experiments to characterize I/O throughput behavior of two-level storage and compare its performance to that of HDFS and OrangeFS, using TeraSort benchmark. Theoretical models and experimental tests both show that the two-level storage system can increase the aggregate I/O throughputs. This work lays a solid foundation for future work in designing and building HPC systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
