Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library
Adrien Devresse, Fabrizio Furano

TL;DR
This paper presents how the HTTP protocol was adapted and optimized for high-performance I/O in large-scale data analysis, demonstrating competitive performance with specialized HPC protocols using the libdavix library.
Contribution
It introduces modifications to HTTP for high-performance computing, including a toolkit called davix, and benchmarks its performance against traditional HPC protocols.
Findings
Davix achieves comparable throughput to HPC protocols.
Optimizations significantly reduce HTTP's performance weaknesses.
The approach enables efficient data access in global computing grids.
Abstract
Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Peer-to-Peer Network Technologies
