# Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array   Clusters

**Authors:** Sungjoon Koh, Jie Zhang, Miryeong Kwon, Jungyeon Yoon, David Donofrio,, Nam Sung Kim, Myoungsoo Jung

arXiv: 1906.08602 · 2019-06-21

## TL;DR

This paper evaluates the performance and system impact of erasure coding, specifically Reed-Solomon codes, in large-scale all-flash storage clusters, comparing it with traditional replication methods.

## Contribution

It provides a comprehensive analysis of erasure coding effects on performance, overheads, and network traffic in a real-world cluster, and releases trace data for further research.

## Key findings

- Erasure coding reduces storage overhead compared to replication.
- Performance impacts vary depending on RS configuration and data layout.
- Network traffic and CPU utilization are significantly affected by erasure coding.

## Abstract

Large-scale systems with all-flash arrays have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding incurs a significantly lower storage overhead than replication. To understand the impact of using erasure coding on the system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster that consists of approximately 100 processor cores with more than 50 high-performance solid-state drives (SSDs), and evaluate the cluster with a popular open-source distributed parallel file system, called Ceph. Specifically, we analyze the behaviors of a system adopting erasure coding from the following five viewpoints, and compare with those of another system using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes, and (5) impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, used by Google file systems, and compare them with triple replication employed by a typical parallel file system as a default fault tolerance mechanism. Lastly, we collect 96 block-level traces from the cluster and release them to the public domain for the use of other researchers.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08602/full.md

## Figures

124 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08602/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1906.08602/full.md

---
Source: https://tomesphere.com/paper/1906.08602