RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems
Lluis Pamies-Juarez, Anwitaman Datta, Frederique Oggier

TL;DR
RapidRAID introduces pipelined erasure codes that significantly accelerate data archival in distributed storage systems by distributing encoding tasks across multiple nodes, reducing encoding time by up to 90%.
Contribution
This paper presents RapidRAID, a novel family of pipelined erasure codes that enable fast, distributed encoding for data archival without sacrificing reliability or storage efficiency.
Findings
Up to 90% reduction in single object encoding time.
Up to 20% reduction when encoding multiple objects concurrently.
Effective implementation demonstrated on real cluster and cloud environments.
Abstract
To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to replication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Privacy-Preserving Technologies in Data
