Hierarchical Coding to Enable Scalability and Flexibility in   Heterogeneous Cloud Storage

Siyi Yang; Ahmed Hareedy; Robert Calderbank; Lara Dolecek

arXiv:1905.02279·cs.IT·May 8, 2019

Hierarchical Coding to Enable Scalability and Flexibility in Heterogeneous Cloud Storage

Siyi Yang, Ahmed Hareedy, Robert Calderbank, Lara Dolecek

PDF

TL;DR

This paper introduces hierarchical coding schemes for heterogeneous cloud storage that enhance scalability and flexibility while maintaining small field sizes, using novel multi-level constructions based on Cauchy Reed-Solomon codes.

Contribution

It presents the first hierarchical locality codes that enable scalable and flexible cloud storage with small field sizes through innovative multi-level code constructions.

Findings

01

First hierarchical locality codes for scalable cloud storage.

02

Double and triple-level constructions based on Cauchy Reed-Solomon codes.

03

Scalable, flexible coding schemes adaptable to multiple layers.

Abstract

In order to accommodate the ever-growing data from various, possibly independent, sources and the dynamic nature of data usage rates in practical applications, modern cloud data storage systems are required to be scalable, flexible, and heterogeneous. Codes with hierarchical locality have been intensively studied due to their effectiveness in reducing the average reading time in cloud storage. In this paper, we present the first codes with hierarchical locality that achieve scalability and flexibility in heterogeneous cloud storage using small field size. We propose a double-level construction utilizing so-called Cauchy Reed-Solomon codes. We then develop a triple-level construction based on this double-level code; this construction can be easily generalized into any hierarchical structure with a greater number of layers since it naturally achieves scalability in the cloud storage…

Tables1

Table 1. TABLE I: Polynomial and normal forms of GF ( 2 4 ) GF superscript 2 4 \textup{GF}(2^{4})

$0$	$0000$	$β^{4}$	$1100$	$β^{8}$	$1010$	$β^{12}$	$1111$
$β$	$0100$	$β^{5}$	$0110$	$β^{9}$	$0101$	$β^{13}$	$1011$
$β^{2}$	$0010$	$β^{6}$	$0011$	$β^{10}$	$1110$	$β^{14}$	$1001$
$β^{3}$	$0001$	$β^{7}$	$1101$	$β^{11}$	$0111$	$β^{15} = 1$	$1000$

Equations55

\mathbb{D}=\left[\begin{array}[]{c|c}4&3\\ \hline\cr 7&6\end{array}\right].

\mathbb{D}=\left[\begin{array}[]{c|c}4&3\\ \hline\cr 7&6\end{array}\right].

\mathbb{D}=\left[\begin{array}[]{cc|cc|cccc}2&3&2&2&2&2&2&2\\ \hline\cr 6&7&5&5&8&8&8&8\\ \hline\cr 9&10&9&9&11&11&11&11\end{array}\right].

\mathbb{D}=\left[\begin{array}[]{cc|cc|cccc}2&3&2&2&2&2&2&2\\ \hline\cr 6&7&5&5&8&8&8&8\\ \hline\cr 9&10&9&9&11&11&11&11\end{array}\right].

\left[\begin{array}[]{cccc}\frac{1}{a_{1}-b_{1}}&\frac{1}{a_{1}-b_{2}}&\dots&\frac{1}{a_{1}-b_{t}}\\ \frac{1}{a_{2}-b_{1}}&\frac{1}{a_{2}-b_{2}}&\dots&\frac{1}{a_{2}-b_{t}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{1}{a_{s}-b_{1}}&\frac{1}{a_{s}-b_{2}}&\dots&\frac{1}{a_{s}-b_{t}}\\ \end{array}\right].

\left[\begin{array}[]{cccc}\frac{1}{a_{1}-b_{1}}&\frac{1}{a_{1}-b_{2}}&\dots&\frac{1}{a_{1}-b_{t}}\\ \frac{1}{a_{2}-b_{1}}&\frac{1}{a_{2}-b_{2}}&\dots&\frac{1}{a_{2}-b_{t}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{1}{a_{s}-b_{1}}&\frac{1}{a_{s}-b_{2}}&\dots&\frac{1}{a_{s}-b_{t}}\\ \end{array}\right].

\mathbb{M}=\left[\begin{array}[]{c}\mathbb{A}\\ -\mathbb{I}_{r}\ \mathbb{0}_{r\times(t-r)}\\ \end{array}\right]^{\mathrm{T}}.

\mathbb{M}=\left[\begin{array}[]{c}\mathbb{A}\\ -\mathbb{I}_{r}\ \mathbb{0}_{r\times(t-r)}\\ \end{array}\right]^{\mathrm{T}}.

\mathbb{G}=\left[\begin{array}[]{c|c|c|c|c|c|c}\mathbb{I}_{k_{1}}&\mathbb{A}_{1,1}&\mathbb{0}&\mathbb{A}_{1,2}&\dots&\mathbb{0}&\mathbb{A}_{1,p}\\ \hline\cr\mathbb{0}&\mathbb{A}_{2,1}&\mathbb{I}_{k_{2}}&\mathbb{A}_{2,2}&\dots&\mathbb{0}&\mathbb{A}_{2,p}\\ \hline\cr\vdots&\vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{p,1}&\mathbb{0}&\mathbb{A}_{p,2}&\dots&\mathbb{I}_{k_{p}}&\mathbb{A}_{p,p}\\ \end{array}\right].

\mathbb{G}=\left[\begin{array}[]{c|c|c|c|c|c|c}\mathbb{I}_{k_{1}}&\mathbb{A}_{1,1}&\mathbb{0}&\mathbb{A}_{1,2}&\dots&\mathbb{0}&\mathbb{A}_{1,p}\\ \hline\cr\mathbb{0}&\mathbb{A}_{2,1}&\mathbb{I}_{k_{2}}&\mathbb{A}_{2,2}&\dots&\mathbb{0}&\mathbb{A}_{2,p}\\ \hline\cr\vdots&\vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{p,1}&\mathbb{0}&\mathbb{A}_{p,2}&\dots&\mathbb{I}_{k_{p}}&\mathbb{A}_{p,p}\\ \end{array}\right].

\mathbb{T}_{x}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x}&\begin{array}[]{c|c|c}\mathbb{B}_{x,1}&\dots&\mathbb{B}_{x,p}\end{array}\\ \hline\cr\mathbb{U}_{x}&\mathbb{Z}_{x}\end{array}\right],

\mathbb{T}_{x}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x}&\begin{array}[]{c|c|c}\mathbb{B}_{x,1}&\dots&\mathbb{B}_{x,p}\end{array}\\ \hline\cr\mathbb{U}_{x}&\mathbb{Z}_{x}\end{array}\right],

\mathbb{H}_{x}^{\mathrm{G}}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x}&\begin{array}[]{c|c|c}\mathbb{B}_{x,1}&\dots&\mathbb{B}_{x,p}\end{array}\\ \hline\cr-\mathbb{I}_{r_{x}}&\mathbb{0}_{r_{x}\times\delta-\delta_{x}}\end{array}\right]^{\mathrm{T}},\mathbb{H}^{\mathrm{L}}_{x}=\left[\begin{array}[]{ccc}\mathbb{A}_{x,x}\\ -\mathbb{I}_{r_{x}}\\ \mathbb{U}_{x}\\ \end{array}\right]^{\mathrm{T}}.

\mathbb{H}_{x}^{\mathrm{G}}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x}&\begin{array}[]{c|c|c}\mathbb{B}_{x,1}&\dots&\mathbb{B}_{x,p}\end{array}\\ \hline\cr-\mathbb{I}_{r_{x}}&\mathbb{0}_{r_{x}\times\delta-\delta_{x}}\end{array}\right]^{\mathrm{T}},\mathbb{H}^{\mathrm{L}}_{x}=\left[\begin{array}[]{ccc}\mathbb{A}_{x,x}\\ -\mathbb{I}_{r_{x}}\\ \mathbb{U}_{x}\\ \end{array}\right]^{\mathrm{T}}.

\small\mathbb{T}_{1}=\mathbb{T}_{2}=\left[\begin{array}[]{c|c}\mathbb{A}_{1,1}&\mathbb{B}_{1,2}\\ \hline\cr\mathbb{U}_{1}&\mathbb{Z}_{1}\end{array}\right]=\left[\begin{array}[]{c|c}\mathbb{A}_{2,2}&\mathbb{B}_{2,1}\\ \hline\cr\mathbb{U}_{2}&\mathbb{Z}_{2}\end{array}\right]=\left[\begin{array}[]{ccc|c}\frac{1}{\beta-\beta^{8}}&\frac{1}{\beta-\beta^{9}}&\frac{1}{\beta-\beta^{10}}&\frac{1}{\beta-\beta^{11}}\\ \frac{1}{\beta^{2}-\beta^{8}}&\frac{1}{\beta^{2}-\beta^{9}}&\frac{1}{\beta^{2}-\beta^{10}}&\frac{1}{\beta^{2}-\beta^{11}}\\ \frac{1}{\beta^{3}-\beta^{8}}&\frac{1}{\beta^{3}-\beta^{9}}&\frac{1}{\beta^{3}-\beta^{10}}&\frac{1}{\beta^{3}-\beta^{11}}\\ \hline\cr\frac{1}{\beta^{7}-\beta^{8}}&\frac{1}{\beta^{7}-\beta^{9}}&\frac{1}{\beta^{7}-\beta^{10}}&\frac{1}{\beta^{7}-\beta^{11}}\end{array}\right]=\left[\begin{array}[]{ccc|c}\beta^{5}&\beta^{12}&\beta^{7}&\beta^{9}\\ 1&\beta^{4}&\beta^{11}&\beta^{6}\\ \beta^{2}&\beta^{14}&\beta^{3}&\beta^{10}\\ \hline\cr\beta^{4}&1&\beta^{9}&\beta^{7}\end{array}\right].

\small\mathbb{T}_{1}=\mathbb{T}_{2}=\left[\begin{array}[]{c|c}\mathbb{A}_{1,1}&\mathbb{B}_{1,2}\\ \hline\cr\mathbb{U}_{1}&\mathbb{Z}_{1}\end{array}\right]=\left[\begin{array}[]{c|c}\mathbb{A}_{2,2}&\mathbb{B}_{2,1}\\ \hline\cr\mathbb{U}_{2}&\mathbb{Z}_{2}\end{array}\right]=\left[\begin{array}[]{ccc|c}\frac{1}{\beta-\beta^{8}}&\frac{1}{\beta-\beta^{9}}&\frac{1}{\beta-\beta^{10}}&\frac{1}{\beta-\beta^{11}}\\ \frac{1}{\beta^{2}-\beta^{8}}&\frac{1}{\beta^{2}-\beta^{9}}&\frac{1}{\beta^{2}-\beta^{10}}&\frac{1}{\beta^{2}-\beta^{11}}\\ \frac{1}{\beta^{3}-\beta^{8}}&\frac{1}{\beta^{3}-\beta^{9}}&\frac{1}{\beta^{3}-\beta^{10}}&\frac{1}{\beta^{3}-\beta^{11}}\\ \hline\cr\frac{1}{\beta^{7}-\beta^{8}}&\frac{1}{\beta^{7}-\beta^{9}}&\frac{1}{\beta^{7}-\beta^{10}}&\frac{1}{\beta^{7}-\beta^{11}}\end{array}\right]=\left[\begin{array}[]{ccc|c}\beta^{5}&\beta^{12}&\beta^{7}&\beta^{9}\\ 1&\beta^{4}&\beta^{11}&\beta^{6}\\ \beta^{2}&\beta^{14}&\beta^{3}&\beta^{10}\\ \hline\cr\beta^{4}&1&\beta^{9}&\beta^{7}\end{array}\right].

\mathbb{A}_{1,2}=\mathbb{A}_{2,1}=\mathbb{B}_{2,1}\mathbb{U}_{1}=\left[\begin{array}[]{ccc}\beta^{13}&\beta^{9}&\beta^{3}\\ \beta^{10}&\beta^{6}&1\\ \beta^{14}&\beta^{10}&\beta^{4}\end{array}\right].

\mathbb{A}_{1,2}=\mathbb{A}_{2,1}=\mathbb{B}_{2,1}\mathbb{U}_{1}=\left[\begin{array}[]{ccc}\beta^{13}&\beta^{9}&\beta^{3}\\ \beta^{10}&\beta^{6}&1\\ \beta^{14}&\beta^{10}&\beta^{4}\end{array}\right].

\small\left[\begin{array}[]{ccc|ccc|ccc|ccc}1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}\\ 0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1\\ 0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}\\ \hline\cr 0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}\\ 0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}\\ 0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}\\ \end{array}\right].

\small\left[\begin{array}[]{ccc|ccc|ccc|ccc}1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}\\ 0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1\\ 0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}\\ \hline\cr 0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}\\ 0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}\\ 0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}\\ \end{array}\right].

\small\mathbb{H}_{1}^{\mathrm{G}}=\left[\begin{array}[]{cccc}\beta^{5}&\beta^{12}&\beta^{7}&\beta^{9}\\ 1&\beta^{4}&\beta^{11}&\beta^{6}\\ \beta^{2}&\beta^{14}&\beta^{3}&\beta^{10}\\ 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\end{array}\right]^{\mathrm{T}},\mathbb{H}_{1}^{\mathrm{L}}=\left[\begin{array}[]{cccc}\beta^{5}&\beta^{12}&\beta^{7}\\ 1&\beta^{4}&\beta^{11}\\ \beta^{2}&\beta^{14}&\beta^{3}\\ 1&0&0\\ 0&1&0\\ 0&0&1\\ \beta^{4}&1&\beta^{9}\end{array}\right]^{\mathrm{T}}.

\small\mathbb{H}_{1}^{\mathrm{G}}=\left[\begin{array}[]{cccc}\beta^{5}&\beta^{12}&\beta^{7}&\beta^{9}\\ 1&\beta^{4}&\beta^{11}&\beta^{6}\\ \beta^{2}&\beta^{14}&\beta^{3}&\beta^{10}\\ 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\end{array}\right]^{\mathrm{T}},\mathbb{H}_{1}^{\mathrm{L}}=\left[\begin{array}[]{cccc}\beta^{5}&\beta^{12}&\beta^{7}\\ 1&\beta^{4}&\beta^{11}\\ \beta^{2}&\beta^{14}&\beta^{3}\\ 1&0&0\\ 0&1&0\\ 0&0&1\\ \beta^{4}&1&\beta^{9}\end{array}\right]^{\mathrm{T}}.

\mathbb{G}=\left[\begin{array}[]{c|c|c|c}\mathbb{F}_{1,1}&\mathbb{F}_{1,2}&\dots&\mathbb{F}_{1,p_{0}}\\ \hline\cr\mathbb{F}_{2,1}&\mathbb{F}_{2,2}&\dots&\mathbb{F}_{2,p_{0}}\\ \hline\cr\vdots&\vdots&\ddots&\vdots\\ \hline\cr\mathbb{F}_{p_{0},1}&\mathbb{F}_{p_{0},2}&\dots&\mathbb{F}_{p_{0},p_{0}}\\ \end{array}\right],

\mathbb{G}=\left[\begin{array}[]{c|c|c|c}\mathbb{F}_{1,1}&\mathbb{F}_{1,2}&\dots&\mathbb{F}_{1,p_{0}}\\ \hline\cr\mathbb{F}_{2,1}&\mathbb{F}_{2,2}&\dots&\mathbb{F}_{2,p_{0}}\\ \hline\cr\vdots&\vdots&\ddots&\vdots\\ \hline\cr\mathbb{F}_{p_{0},1}&\mathbb{F}_{p_{0},2}&\dots&\mathbb{F}_{p_{0},p_{0}}\\ \end{array}\right],

\mathbb{F}_{x,x}=\left[\begin{array}[]{c|c|c|c|c}\mathbb{I}_{k_{x,1}}&\mathbb{A}_{x,x;1,1}&\dots&\mathbb{0}&\mathbb{A}_{x,x;1,p_{x}}\\ \hline\cr\vdots&\ddots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{x,x;p_{x},1}&\dots&\mathbb{I}_{k_{x,p_{x}}}&\mathbb{A}_{x,x;p_{x},p_{x}}\\ \end{array}\right],

\mathbb{F}_{x,x}=\left[\begin{array}[]{c|c|c|c|c}\mathbb{I}_{k_{x,1}}&\mathbb{A}_{x,x;1,1}&\dots&\mathbb{0}&\mathbb{A}_{x,x;1,p_{x}}\\ \hline\cr\vdots&\ddots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{x,x;p_{x},1}&\dots&\mathbb{I}_{k_{x,p_{x}}}&\mathbb{A}_{x,x;p_{x},p_{x}}\\ \end{array}\right],

\mathbb{F}_{x,y}=\left[\begin{array}[]{c|c|c|c|c}\mathbb{0}&\mathbb{A}_{x,y;1,1}&\dots&\mathbb{0}&\mathbb{A}_{x,y;1,p_{y}}\\ \hline\cr\vdots&\ddots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{x,y;p_{x},1}&\dots&\mathbb{0}&\mathbb{A}_{x,y;p_{x},p_{y}}\\ \end{array}\right].

\mathbb{F}_{x,y}=\left[\begin{array}[]{c|c|c|c|c}\mathbb{0}&\mathbb{A}_{x,y;1,1}&\dots&\mathbb{0}&\mathbb{A}_{x,y;1,p_{y}}\\ \hline\cr\vdots&\ddots&\ddots&\vdots&\vdots\\ \hline\cr\mathbb{0}&\mathbb{A}_{x,y;p_{x},1}&\dots&\mathbb{0}&\mathbb{A}_{x,y;p_{x},p_{y}}\\ \end{array}\right].

\mathbb{T}_{x,i}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x;i,i}&\begin{array}[]{c|c|c|c}\mathbb{B}_{x,x;i}&\mathbb{E}_{x,1;i}&\dots&\mathbb{E}_{x,p_{0};i}\end{array}\\ \hline\cr\begin{array}[]{c}\mathbb{U}_{x,i}\\ \hline\cr\mathbb{V}_{x,i}\end{array}&\mathbb{Z}_{x,i}\end{array}\right],

\mathbb{T}_{x,i}=\left[\begin{array}[]{c|c}\mathbb{A}_{x,x;i,i}&\begin{array}[]{c|c|c|c}\mathbb{B}_{x,x;i}&\mathbb{E}_{x,1;i}&\dots&\mathbb{E}_{x,p_{0};i}\end{array}\\ \hline\cr\begin{array}[]{c}\mathbb{U}_{x,i}\\ \hline\cr\mathbb{V}_{x,i}\end{array}&\mathbb{Z}_{x,i}\end{array}\right],

\textit{where }\text{ }\mathbb{B}_{x,x;i}=\left[\begin{array}[]{c|c|c}\mathbb{B}_{x,x;i,1}&\dots&\mathbb{B}_{x,x;i,p_{x}}\end{array}\right]

\textit{where }\text{ }\mathbb{B}_{x,x;i}=\left[\begin{array}[]{c|c|c}\mathbb{B}_{x,x;i,1}&\dots&\mathbb{B}_{x,x;i,p_{x}}\end{array}\right]

\textit{and }\text{ }\mathbb{E}_{x,y;i}=\left[\begin{array}[]{c|c|c}\mathbb{E}_{x,y;i;1}&\dots&\mathbb{E}_{x,y;i;p_{y}}\end{array}\right],

\textit{and }\text{ }\mathbb{E}_{x,y;i}=\left[\begin{array}[]{c|c|c}\mathbb{E}_{x,y;i;1}&\dots&\mathbb{E}_{x,y;i;p_{y}}\end{array}\right],

\mathbb{V}_{x,i}=\left[\begin{array}[]{ccc}\frac{1}{\beta^{6}-\beta^{8}}&\frac{1}{\beta^{6}-\beta^{9}}&\frac{1}{\beta^{6}-\beta^{10}}\end{array}\right]=\left[\begin{array}[]{ccc}\beta&\beta^{10}&\beta^{8}\end{array}\right]

\mathbb{V}_{x,i}=\left[\begin{array}[]{ccc}\frac{1}{\beta^{6}-\beta^{8}}&\frac{1}{\beta^{6}-\beta^{9}}&\frac{1}{\beta^{6}-\beta^{10}}\end{array}\right]=\left[\begin{array}[]{ccc}\beta&\beta^{10}&\beta^{8}\end{array}\right]

\textit{and }\text{ }\mathbb{E}_{x,y;i;1}=\left[\begin{array}[]{ccc}\frac{1}{\beta-\beta^{12}}\\ \frac{1}{\beta^{2}-\beta^{12}}\\ \frac{1}{\beta^{3}-\beta^{12}}\\ \end{array}\right]=\left[\begin{array}[]{ccc}\beta^{2}\\ \beta^{8}\\ \beta^{5}\\ \end{array}\right].

\textit{and }\text{ }\mathbb{E}_{x,y;i;1}=\left[\begin{array}[]{ccc}\frac{1}{\beta-\beta^{12}}\\ \frac{1}{\beta^{2}-\beta^{12}}\\ \frac{1}{\beta^{3}-\beta^{12}}\\ \end{array}\right]=\left[\begin{array}[]{ccc}\beta^{2}\\ \beta^{8}\\ \beta^{5}\\ \end{array}\right].

\mathbb{A}_{x,y;i,j}=\mathbb{E}\mathbb{V}_{y,j}=\left[\begin{array}[]{ccc}\beta^{3}&\beta^{12}&\beta^{10}\\ \beta^{9}&\beta^{3}&\beta\\ \beta^{6}&1&\beta^{13}\end{array}\right].

\mathbb{A}_{x,y;i,j}=\mathbb{E}\mathbb{V}_{y,j}=\left[\begin{array}[]{ccc}\beta^{3}&\beta^{12}&\beta^{10}\\ \beta^{9}&\beta^{3}&\beta\\ \beta^{6}&1&\beta^{13}\end{array}\right].

\small\left[\begin{array}[]{ccc|ccc|ccc|ccc|ccc|ccc|ccc|ccc}1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}\\ 0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta\\ 0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}\\ \hline\cr 0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}\\ 0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta\\ 0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}\\ \hline\cr 0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}\\ 0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta&0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1\\ 0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}\\ \hline\cr 0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}\\ 0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}\\ 0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}\\ \end{array}\right].

\small\left[\begin{array}[]{ccc|ccc|ccc|ccc|ccc|ccc|ccc|ccc}1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}\\ 0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta\\ 0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}\\ \hline\cr 0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}\\ 0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta\\ 0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}\\ \hline\cr 0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}\\ 0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta&0&1&0&1&\beta^{4}&\beta^{11}&0&0&0&\beta^{10}&\beta^{6}&1\\ 0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}\\ \hline\cr 0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{3}&\beta^{12}&\beta^{10}&0&0&0&\beta^{13}&\beta^{9}&\beta^{3}&1&0&0&\beta^{5}&\beta^{12}&\beta^{7}\\ 0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{9}&\beta^{3}&\beta&0&0&0&\beta^{10}&\beta^{6}&1&0&1&0&1&\beta^{4}&\beta^{11}\\ 0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{6}&1&\beta^{13}&0&0&0&\beta^{14}&\beta^{10}&\beta^{4}&0&0&1&\beta^{2}&\beta^{14}&\beta^{3}\\ \end{array}\right].

\mathbb{D}=\left[\begin{array}[]{cc|cc|cccc}2&3&2&2&2&2&2&2\\ \hline\cr 6&7&5&5&8&8&8&8\\ \hline\cr 9&10&9&9&11&11&11&11\end{array}\right].

\mathbb{D}=\left[\begin{array}[]{cc|cc|cccc}2&3&2&2&2&2&2&2\\ \hline\cr 6&7&5&5&8&8&8&8\\ \hline\cr 9&10&9&9&11&11&11&11\end{array}\right].

A_{1^{a}, 1^{a}}

A_{1^{a}, 1^{a}}

B_{1^{b}, 1^{a}}

A_{1^{b}, 1^{b}}

B_{1^{a}, 1^{b}}

B_{1^{a}, i}

B_{1^{b}, i}

B_{i, 1^{a}}

B_{i, 1^{b}}

U_{1}^{a}

U_{1}^{b}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Hierarchical Coding to Enable Scalability and Flexibility in Heterogeneous Cloud Storage

Siyi Yang1, Ahmed Hareedy2, Robert Calderbank2, and Lara Dolecek1

1 Electrical and Computer Engineering Department, University of California, Los Angeles, Los Angeles, CA 90095 USA

2 Electrical and Computer Engineering Department, Duke University, Durham, NC 27705 USA

[email protected], [email protected], [email protected], and [email protected]

Abstract

In order to accommodate the ever-growing data from various, possibly independent, sources and the dynamic nature of data usage rates in practical applications, modern cloud data storage systems are required to be scalable, flexible, and heterogeneous. Codes with hierarchical locality have been intensively studied due to their effectiveness in reducing the average reading time in cloud storage. In this paper, we present the first codes with hierarchical locality that achieve scalability and flexibility in heterogeneous cloud storage using small field size. We propose a double-level construction utilizing so-called Cauchy Reed-Solomon codes. We then develop a triple-level construction based on this double-level code; this construction can be easily generalized into any hierarchical structure with a greater number of layers since it naturally achieves scalability in the cloud storage systems.

I Introduction

Codes offering hierarchical locality have been intensely studied because of their ability to reduce the average reading time in various erasure-resilient data storage applications including Flash storage, redundant array of independent disks (RAID) storage, cloud storage, etc. [1, 2, 3]. Codes with shorter block lengths offer lower latency, but they provide limited erasure-correction capability in a cloud storage system. To deal with more erasures, longer codes can be employed. However, since a simultaneous occurrence of a large number of erasures is a rare event, longer codes result in unnecessary extra reading cost, and are on average inefficient. Therefore, maintaining low latency while simultaneously recovering from a potentially large number of erasures is one of the major challenges in cloud storage. Codes with hierarchical locality have been shown to address this issue by providing multi-level access in cloud storage, which enables the data to be read through a chain of network components with increasing data lengths from top to bottom; this architecture is exploited to increase the overall erasure-correction capability[4].

In the literature, codes offering double-level access have been intensely studied[3, 4, 5, 6, 7, 8]; these codes are applicable in double-level cloud storage. In this configuration, $p$ consecutive local messages are jointly encoded into $p$ correlated local codewords. Each local codeword is stored at the neighboring servers of the corresponding local cloud. The codes are designed such that each local message can be successfully decoded from the corresponding local codeword when there are fewer than $d_{1}$ local erasures, and the global codeword provides extra protection against $(d_{2}-d_{1})$ unexpected errors in a local codeword, for some $d_{2}>d_{1}$ . An example having $p=4$ is in Fig. 1. Suppose $d_{1}=2$ and $d_{2}=3$ . When there is at most $1$ server failure, accessing the servers connected to cloud $1$ is sufficient to successfully decode the data stored in cloud $1$ . If the number of server failures in cloud $1$ is $2$ , the data can still be obtained through accessing all the servers. Codes with hierarchical locality are a generalized extension of double-level accessible codes, in which more than two levels of access are allowed and are naturally suitable for cloud storage with multiple layers.

Along with hierarchical locality discussed previously, it is also important for the coding schemes to support scalable, heterogeneous, and flexible cloud storage[9]. Scalability enables expanding the backbone network to accommodate additional workload, i.e., additional clouds, without rebuilding the entire infrastructure. Heterogeneity refers to the property of allowing nonidentical local data lengths and providing unequal local protection, which is important for cloud storage with heterogeneous structures. A heterogeneous structure arises in networks consisting of geographically separated components, and they often store data from different sources. Flexibility has been firstly investigated for dynamic data storage systems in [8], and it refers to the property that the local cloud can be split into two smaller local clouds without worsening the global erasure-correction capability nor changing the remaining components. This splitting, for example, is applied when cold data stored at a local cloud become hot unexpectedly.

Various codes offering hierarchical locality have been studied. Cassuto et al.[3] presented so-called multi-block interleaved codes that provide double-level access; this work introduced the concept of multi-level access. The family of integrated-interleaved (I-I) codes, including generalized integrated interleaved (GII) codes and extended integrated interleaved (EII) codes, has been a major prototype for codes with multi-level access [4, 7, 6, 5]. GII codes have the advantage of correcting a large set of error patterns, but the distribution of the data symbols is highly restricted, and all the local codewords are equally protected. EII codes are extensions of GII codes with double-level access, where specific arrangements of data symbols have been investigated, mitigating the aforementioned restriction. However, no similar study has been proposed for GII codes with hierarchical locality. Therefore, I-I codes are more suitable for applications where heterogeneity and flexibility are less important. Sum-rank codes are another family of codes that is proposed for dynamic distributed storage offering double-level access[8]. These codes are maximally recoverable, flexible, and allow unequal protection for local data. However, sum-rank codes require a finite field size that grows exponentially with the maximum local block length, which is a major obstacle to being implemented in real world applications.

In this paper, we introduce code constructions with hierarchical locality and a small field size that achieve scalability, heterogeneity, and flexibility. The paper is organized as follows. In Section II, we introduce the notation and preliminaries. In Section III, we present a new construction of codes offering hierarchical locality that is based on Cauchy Reed Solomon (CRS) codes. This construction requires a field size that grows linearly with the maximum local codelength. In Section IV, we then show that our coding scheme is scalable, heterogeneous, and flexible. Finally, we summarize our results in Section V.

II Notation and Preliminaries

Throughout the rest of this paper, $\left[N\right]$ refers to $\{1,2,\dots,N\}$ , and $\left[a:b\right]$ refers to $\{a,a+1,\dots,b\}$ . Denote the all zero vector of length $s$ by $\mathbb{0}_{s}$ . Similarly, the all zero matrix of size $s\times t$ is denoted by $\mathbb{0}_{s\times t}$ . The alphabet field, denoted by $\textup{GF}(q)$ , is a Galois field of size $q$ , where $q$ is a power of a prime. For a vector $\mathbb{v}$ of length $n$ , $v_{i}$ , $1\leq i\leq n$ , represents the $i$ -th component of $\mathbb{v}$ , and $\mathbb{v}\left[a:b\right]=(v_{a},\dots,v_{b})$ . For a matrix $\mathbb{M}$ of size $a\times b$ , $\mathbb{M}\left[i_{1}:i_{2},j_{1}:j_{2}\right]$ represents the sub-matrix $\mathbb{M}^{\prime}$ of $\mathbb{M}$ such that $(\mathbb{M}^{\prime})_{i-i_{1}+1,j-j_{1}+1}=(\mathbb{M})_{i,j}$ , $i\in\left[i_{1}:i_{2}\right]$ , $j\in\left[j_{1}:j_{2}\right]$ . All indices start from $1$ .

II-A Notation and Definitions

Let $\mathbb{m}$ and $\mathbb{c}$ represent messages and codewords, respectively. A set $\mathcal{C}$ is called an $(n,k,d)_{q}$ -code if $\mathcal{C}\subset\textup{GF}(q)^{n}$ , $\mathrm{dim}(\mathcal{C})=k$ , and $\min\limits_{\mathbb{c}_{1},\mathbb{c}_{2}\in\mathcal{C},\mathbb{c}_{1}\neq\mathbb{c}_{2}}d_{\textup{H}}(\mathbb{c}_{1},\mathbb{c}_{2})=d$ , where $d_{\textup{H}}$ refers to the Hamming distance. We next define a family of codes with double-level access. Note that our discussion is restricted to linear block codes.

Definition 1.

Let $p,q\in\mathbb{N}$ . Let $\mathbb{n}=(n_{1},n_{2},\dots,n_{p})\in\mathbb{N}^{p}$ , $\mathbb{k}=(k_{1},k_{2},\dots,k_{p})\in\mathbb{N}^{p}$ , $\mathbb{D}\in\mathbb{N}^{2\times p}$ , $(\mathbb{D})_{x,y}=d_{x,y}$ , where $d_{1,x}<d_{2,x}$ , $k_{x}<n_{x}$ , for all $x,y\in\left[p\right]$ .

Let $n=n_{1}+n_{2}+\cdots+n_{p}$ . Let $s_{0}=0$ and $s_{x}=n_{1}+n_{2}+\cdots+n_{x}$ , $x\in\left[p\right]$ . Let $\mathbb{c}_{x}$ denote $\mathbb{c}\left[s_{{x}-1}+1:s_{x}\right]$ and let $\mathbb{m}_{x}$ denote the message corresponding to $\mathbb{c}_{x}$ , for $x\in\left[p\right]$ . A set $\mathcal{C}\subset\textup{GF}(q)^{n}$ is called an $(\mathbb{n},\mathbb{k},\mathbb{D},p)_{q}$ -code if the following conditions are satisfied:

Let $\mathcal{C}_{x}=\{\mathbb{c}\left[s_{x-1}+1:s_{x}\right]:\mathbb{c}\in\mathcal{C}\}$ , $x\in\left[p\right]$ . Each $\mathcal{C}_{x}$ is an $(n_{x},k_{x},d_{1,{x}})_{q}$ -code. 2. 2.

Let $\mathcal{A}_{x}=\{\mathbb{c}\left[s_{{x}-1}+1:s_{x}\right]:\mathbb{c}\in\mathcal{C},\mathbb{c}\left[s_{y-1}+1:s_{y}\right]=\mathbb{0}_{n_{y}},\forall y\in\left[p\right]\setminus\{x\}\}$ , $x\in\left[p\right]$ . Each $\mathcal{A}_{x}$ is an $(n_{x},k_{x},d_{2,x})_{q}$ -code.

Example 1.

Let $q=16$ and $p=2$ . Let $\mathbb{n}=(10,11)$ and $\mathbb{k}=(6,7)$ . Then, $\mathbb{r}=\mathbb{n}-\mathbb{k}=(4,4)$ . Suppose $\mathbb{D}$ is specified as follows:

[TABLE]

Then, one can construct an $(\mathbb{n},\mathbb{k},\mathbb{D},p)_{q}$ -code with the parameters specified previously.

Any $(\mathbb{n},\mathbb{k},\mathbb{D},p)_{q}$ -code specified according to 1 corrects $(d_{1,x}-1)$ erasures in the $i$ -th local codeword via local access, and corrects additional $(d_{2,x}-d_{1,x})$ erasures through global access when other local codewords are all correctable via local access. Following this notation, 2 extends 1 into the triple-level case.

Definition 2.

Let $q,p_{0}\in\mathbb{N}$ , $\mathbb{p}=(p_{1},p_{2},\dots,p_{p_{0}})\in\mathbb{N}^{p_{0}}$ , $p=p_{1}+p_{2}+\cdots+p_{p_{0}}$ . Let $\mathbb{n}=(\mathbb{n}_{1},\mathbb{n}_{2},\dots,\mathbb{n}_{p_{0}})\in\mathbb{N}^{p_{0}}$ , $\mathbb{k}=(\mathbb{k}_{1},\mathbb{k}_{2},\dots,\mathbb{k}_{p_{0}})\in\mathbb{N}^{p_{0}}$ , where $\mathbb{n}_{x}=(n_{x,1},n_{x,2},\dots,n_{x,p_{x}})\in\mathbb{N}^{p_{x}}$ , $\mathbb{k}_{x}=(k_{x,1},k_{x,2},\dots,k_{x,p_{x}})\in\mathbb{N}^{p_{x}}$ , for all $x\in\left[p_{0}\right]$ .

Let $t_{0}=0$ , $t_{x}=p_{1}+p_{2}+\cdots+p_{x}$ , $x\in\left[p_{0}\right]$ . Suppose $\mathbb{D}\in\mathbb{N}^{3\times p}$ . Let $d_{l,x,i}=(\mathbb{D})_{l,t_{x-1}+i}$ , $l\in\left[3\right]$ so that $d_{1,x,i}<d_{2,x,i}<d_{3,x,i}$ , for $x\in\left[p_{0}\right]$ and $i\in\left[p_{x}\right]$ . Let $\mathbb{D}_{x}=\mathbb{D}\left[1:2,t_{x-1}+1:t_{x}\right]$ , $x\in\left[p_{0}\right]$ . Let $n_{x}=n_{x,1}+n_{x,2}+\cdots+n_{x,p_{x}}$ for all $x\in\left[p_{0}\right]$ . Let $n=n_{1}+n_{2}+\cdots+n_{p_{0}}$ . Let $s_{0}=0$ , $s_{x}=n_{1}+n_{2}+\cdots+n_{x}$ , $x\in\left[p_{0}\right]$ . Let $s_{x,0}=s_{x}$ , $s_{x,i}=s_{x}+n_{x,1}+n_{x,2}+\cdots+n_{x,i}$ , for all $x\in\left[p_{0}\right]$ and $i\in\left[p_{x}\right]$ . Let $\mathbb{c}_{x,i}$ denote $\mathbb{c}\left[s_{x,i-1}+1:s_{x,i}\right]$ and let $\mathbb{m}_{x,i}$ denote the message corresponding to $\mathbb{c}_{x,i}$ , for $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ . A set $\mathcal{C}\subset\textup{GF}(q)^{n}$ is called an $(\mathbb{n},\mathbb{k},\mathbb{D},p_{0},\mathbb{p})_{q}$ -code if the following conditions are satisfied:

Let $\mathcal{C}_{x}=\{\mathbb{c}\left[s_{x-1}+1:s_{x}\right]:\mathbb{c}\in\mathcal{C}\}$ , $x\in\left[p_{0}\right]$ . Each $\mathcal{C}_{x}$ is an $(\mathbb{n}_{x},\mathbb{k}_{x},\mathbb{D}_{x},p_{x})_{q}$ -code. 2. 2.

Let $\mathcal{A}_{x,i}=\{\mathbb{c}\left[s_{x,i-1}+1:s_{x,i}\right]:\mathbb{c}\in\mathcal{C},\mathbb{c}\left[s_{y,j-1}+1:s_{y,j}\right]=\mathbb{0}_{n_{y,j}},\forall y\in\left[p_{0}\right],j\in\left[p_{y}\right],(x,i)\neq(y,j)\}$ . Each $\mathcal{A}_{x}$ is an $(n_{x,i},k_{x,i},d_{3,x,i})_{q}$ -code.

Example 2.

Let $q=16$ , $p_{0}=3$ , and $\mathbb{p}=(2,2,4)$ . Let $\mathbb{n}=(\mathbb{n}_{1},\mathbb{n}_{2},\mathbb{n}_{3})$ , where $\mathbb{n}_{1}=(10,11)$ , $\mathbb{n}_{2}=(10,10)$ , and $\mathbb{n}_{3}=(12,12,12,12)$ . Let $\mathbb{k}=(\mathbb{k}_{1},\mathbb{k}_{2},\mathbb{k}_{3})$ , where $\mathbb{k}_{1}=(6,6)$ , $\mathbb{k}_{2}=(7,7)$ , and $\mathbb{k}_{3}=(9,8,9,9)$ . Then, $\mathbb{r}=\mathbb{n}-\mathbb{k}=(\mathbb{r}_{1},\mathbb{r}_{2},\mathbb{r}_{3})$ , where $\mathbb{r}_{1}=(4,5)$ , $\mathbb{r}_{2}=(3,3)$ , $\mathbb{r}_{3}=(3,4,3,3)$ . Suppose $\mathbb{D}$ is specified as follows:

[TABLE]

Then, one can construct an $(\mathbb{n},\mathbb{k},\mathbb{D},p_{0},\mathbb{p})_{q}$ -code with the parameters specified previously.

This definition can be easily generalized into codes with more than three levels of access. For simplicity, we constrain our discussion to the triple-level case.

II-B Cauchy Matrices

Cauchy matrices are the key component in the construction that we will introduce shortly.

Definition 3.

(Cauchy matrix) Let $s,t\in\mathbb{N}$ and $\textup{GF}(q)$ be a finite field of size $q$ . Suppose $a_{1},\dots,a_{x},b_{1},\dots,b_{y}$ are pairwise distinct elements in $\textup{GF}(q)$ . The following matrix is known as a Cauchy matrix,

[TABLE]

We denote this matrix by $\mathbb{Y}(a_{1},\dots,a_{s};b_{1},\dots,b_{t})$ .

Cauchy matrices are totally invertible, i.e., every square sub-matrix of a Cauchy matrix is invertible. The inverse of a given Cauchy matrix can be explicitly computed using algorithms of lower complexity than those for inverting Vandermonde matrices. These properties make Cauchy matrices promising in designing systematic maximum distance separable (MDS) codes. Lemma 1 presents a useful result about Cauchy matrices that will be used repeatedly in this paper.

Lemma 1.

Let $s,t,r\in\mathbb{N}$ such that $t-s<r\leq t$ , $\mathbb{A}\in\textup{GF}(q)^{s\times t}$ . If $\mathbb{A}$ is a Cauchy matrix, then the following matrix $\mathbb{M}$ is a parity-check matrix of an $(s+r,s+r-t,t+1)_{q}$ -code.

[TABLE]

Proof.

The parity-check matrix of an $(s+r,s+r-t,t+1)_{q}$ -code satisfies the property that every $t$ columns of this matrix are linearly independent. Therefore, we only need to prove that every $t$ rows of $\mathbb{M}^{\mathrm{T}}$ are linearly independent. We prove Lemma 1 by contradiction. Suppose there exist $t$ rows from $\mathbb{M}^{\mathrm{T}}$ that are linearly dependent. Suppose $a$ of these linearly dependent rows $\mathbb{r}_{1},\mathbb{r}_{2},\dots,\mathbb{r}_{a}$ are from $\mathbb{A}$ , and the other $t-a$ rows $\mathbb{r}_{a+1},\mathbb{r}_{a+2},\dots,\mathbb{r}_{t}$ are from $\left[-\mathbb{I}_{r}\ \mathbb{0}_{r\times(t-r)}\right]$ , where $0\leq t-a\leq r$ . Suppose the entries with $-1$ in $\mathbb{r}_{a+1},\mathbb{r}_{a+2},\dots,\mathbb{r}_{t}$ are located in the $i_{1},i_{2}\dots,i_{t-a}$ -th columns of $\mathbb{M}^{\textup{T}}$ , then $i_{p}\leq r$ for all $1\leq p\leq t-a$ . Observe that $\left[t\right]$ is the set of indices of all columns in $\mathbb{M}^{\mathrm{T}}$ . Suppose $\left[t\right]\setminus\{i_{1},i_{2},\dots,i_{t-a}\}=\{j_{1},j_{2},\dots,j_{a}\}$ . Then the $a\times a$ sub-matrix of the intersection of the rows $\mathbb{r}_{1},\mathbb{r}_{2},\dots,\mathbb{r}_{a}$ and the $j_{1},j_{2},\dots,j_{a}$ -th columns of $\mathbb{A}$ is singular. A contradiction. ∎

III Codes for Multi-Level Access

Following the definitions and notation introduced in Section II, we present a CRS-based code with double-level access in Section III-A. Then, we extend our construction into a triple-level case in Section III-B.

III-A Codes with Double-Level Access

In this subsection, we provide a construction of codes offering double-level access based on the CRS codes. Note that the generator matrix of any systematic code with double-level access has the following structure:

[TABLE]

Construction 1.

(CRS-based code) Let $p\in\mathbb{N}$ , $k_{1},k_{2},\dots,k_{p}\in\mathbb{N}$ , $n_{1},n_{2},\dots,n_{p}\in\mathbb{N}$ , $\delta_{1},\delta_{2},\dots,\delta_{p}\in\mathbb{N}$ and $\delta=\delta_{1}+\delta_{2}+\dots+\delta_{p}$ , with $r_{x}=n_{x}-k_{x}>0$ for all $x\in\left[p\right]$ . Let $GF(q)$ be a finite field such that $q\geq\max\nolimits_{x\in\left[p\right]}\{n_{x}\}+\delta$ .

For each $x\in\left[p\right]$ , let $a_{x,i}$ , $b_{x,j}$ , $i\in\left[k_{x}+\delta_{x}\right]$ , $j\in\left[r_{x}-\delta_{x}+\delta\right]$ , be distinct elements of $\textup{GF}(q)$ . Consider the Cauchy matrix $\mathbb{T}_{x}\in\textup{GF}(q)^{(k_{x}+\delta_{x})\times(r_{x}-\delta_{x}+\delta)}$ such that $\mathbb{T}_{x}=\mathbb{Y}(a_{x,1},\dots,a_{x,k_{x}+\delta_{x}};b_{x,1},\dots,b_{x,r_{x}-\delta_{x}+\delta})$ . For each $x\in\left[p\right]$ , we obtain $\{\mathbb{B}_{x,i}\}_{i\in\left[p\right]\setminus\{x\}}$ , $\mathbb{U}_{x}$ , $\mathbb{A}_{x,x}$ , according to the following partition of $\mathbb{T}_{x}$ ,

[TABLE]

where $\mathbb{A}_{x,x}\in\textup{GF}(q)^{k_{x}\times r_{x}}$ , $\mathbb{B}_{x,i}\in\textup{GF}(q)^{k_{x}\times\delta_{i}}$ , $\mathbb{U}_{x}\in\textup{GF}(q)^{\delta_{x}\times r_{x}}$ . Moreover, $\mathbb{A}_{x,y}=\mathbb{B}_{x,y}\mathbb{U}_{y}$ , for $x\neq y$ .

Matrices $\mathbb{A}_{x,x}$ and $\mathbb{A}_{x,y}$ are substituted in $\mathbb{G}$ specified in (3), for all $x,y\in\left[p\right]$ , $x\neq y$ . Let $\mathcal{C}_{1}$ represent the code with generator matrix $\mathbb{G}$ .

Lemma 2.

Following the notation in 1, let $d_{1,x}=r_{x}-\delta_{x}+1$ , $d_{2,x}=r_{x}-\delta_{x}+\delta+1$ , for $x\in\left[p\right]$ . Then, code $\mathcal{C}_{1}$ specified in 1 is an $(\mathbb{n},\mathbb{k},\mathbb{D},p)_{q}$ -code.

Sketch of the proof.

For each $x\in\left[p\right]$ , define $\mathbb{y}_{x}=\sum\nolimits_{y\in\left[p\right],y\neq x}\mathbb{m}_{y}\mathbb{B}_{y,x}$ . It follows from $\mathbb{m}\mathbb{G}=\mathbb{c}$ and (3) that for $x\in\left[p\right]$ , $\mathbb{c}_{x}=\left[\mathbb{m}_{x},\mathbb{m}_{x}\mathbb{A}_{x,x}+\mathbb{y}_{x}\mathbb{U}_{x}\right]$ . Define the local parity-check matrix $\mathbb{H}^{\mathrm{L}}_{x}$ and the global parity-check matrix $\mathbb{H}^{\mathrm{G}}_{x}$ , for each $x\in\left[p\right]$ , as follows:

[TABLE]

We next prove the equations of the local distance $d_{1,x}=r_{x}-\delta_{x}+1$ and the global distance $d_{2,x}=r_{x}-\delta_{x}+\delta+1$ using $\mathbb{H}^{\mathrm{L}}_{x}$ and $\mathbb{H}^{\mathrm{G}}_{x}$ , $x\in\left[p\right]$ .

To prove the equation of the local distance, let $\tilde{\mathbb{c}}_{x}=\left[\mathbb{c}_{x},\mathbb{y}_{x}\right]$ . Then, one can show that $\tilde{\mathbb{c}}_{x}$ belongs to a code $\mathcal{C}_{x}^{\mathrm{L}}$ with the local parity-check matrix $\mathbb{H}^{\mathrm{L}}_{x}$ . From Lemma 1, $\mathcal{C}_{x}^{\mathrm{L}}$ is an $(n_{x}+\delta_{x},k_{x},r_{x}+1)_{q}$ -code. Therefore, any $r_{x}$ erasures in $\tilde{\mathbb{c}}_{x}$ are correctable. Provided that $\mathbb{y}_{x}$ has length $\delta_{x}$ , we can consider the entries of $\mathbb{y}_{x}$ as erasures and thus any $(r_{x}-\delta_{x})$ erasures in the remaining part of $\tilde{\mathbb{c}}_{x}$ , i.e., $\mathbb{c}_{x}$ , can be corrected. Therefore, $d_{1,x}=r_{x}-\delta_{x}+1$ .

To prove the equation of the global distance, assume all the local codewords except for $\mathbb{c}_{x}$ are successfully decodable locally. Then, for each $x\in\left[p\right]$ , $\mathbb{y}_{x}$ and $\mathbb{s}_{x}=\left[\mathbb{m}_{x}\mathbb{B}_{x,1},\dots,\mathbb{m}_{x}\mathbb{B}_{x,p}\right]$ are computable. Let $\bar{\mathbb{c}}_{x}=\mathbb{c}_{x}-\left[\mathbb{0}_{k_{x}},\mathbb{y}_{x}\mathbb{U}_{x}\right]$ , then one can show that $\mathbb{H}^{\mathrm{G}}_{x}\bar{\mathbb{c}}_{x}^{\mathrm{T}}=\left[\mathbb{0}_{r_{x}},\mathbb{s}_{x}\right]^{\mathrm{T}}$ . From Lemma 1 and from the construction of $\mathbb{H}^{\mathrm{G}}_{x}$ , any $(r_{x}-\delta_{x}+\delta)$ erasures in $\bar{\mathbb{c}}_{x}$ are correctable, thus $(r_{x}-\delta_{x}+\delta)$ erasures in $\mathbb{c}_{x}$ are also correctable. Therefore, $d_{2,x}=r_{x}-\delta_{x}+\delta+1$ . ∎

We next provide a working example for codes in 1. For simplicity, we let all the local codeword lengths and local data lengths be equal. However, the construction itself allows them to be unequal.

Example 3.

Let $q=2^{4}$ , $p=2$ , $r=r_{1}=r_{2}=3$ , $\delta^{\prime}=\delta_{1}=\delta_{2}=1$ , $k=k_{1}=k_{2}=3$ , $n=n_{1}=n_{2}=k+r=6$ , $\delta=\delta_{1}+\delta_{2}=2$ . Then, $d_{1}=r-\delta^{\prime}+1=3-1+1=3$ , $d_{2}=r-\delta^{\prime}+\delta+1=3-1+2+1=5$ . Choose a primitive polynomial over $\textup{GF}(2)$ : $g(X)=X^{4}+X+1$ . Let $\beta$ be a root of $g(X)$ , then $\beta$ is a primitive element of $\textup{GF}(2^{4})$ . The binary representation of all the symbols in $\textup{GF}(2^{4})$ is specified in Table I.

Let $\mathbb{A}_{1,1}=\mathbb{A}_{2,2}$ , $\mathbb{B}_{1,2}=\mathbb{B}_{2,1}$ , $\mathbb{U}_{1}=\mathbb{U}_{2}$ , and $\mathbb{T}_{1}=\mathbb{T}_{2}$ as specified in (5). Therefore,

[TABLE]

Then, the generator matrix $\mathbb{G}$ is specified as follows,

[TABLE]

Suppose $\mathbb{m}_{1}=(1,\beta,\beta^{2})$ , $\mathbb{m}_{2}=(\beta,1,0)$ , then $\mathbb{c}_{1}=(1,\beta,\beta^{2},\beta^{14},0,0)$ and $\mathbb{c}_{2}=(\beta,1,0,\beta^{6},0,\beta^{13})$ . Moreover, $\mathbb{H}_{1}^{\mathrm{L}}$ and $\mathbb{H}_{1}^{\mathrm{G}}$ are specified as follows,

[TABLE]

According to 1, $\mathbb{G}$ is the generator matrix of a double-level accessible code that corrects $2$ local erasures by local access and corrects $2$ extra erasures within a single local cloud by global access. In the following, we denote the erased version of $\mathbb{c}_{1}$ by $\mathbb{c}^{\prime}_{1}$ , and erased symbols by $e_{i}$ , $i\in\mathbb{N}$ .

As an example of decoding by local access, suppose $\mathbb{c}^{\prime}_{1}=(1,e_{1},\beta^{2},e_{2},0,0)$ . Then, the erased elements of $\tilde{\mathbb{c}}_{1}=(1,e_{1},\beta^{2},e_{2},0,0,e_{3})$ can be retrieved using $\mathbb{H}_{1}^{\mathrm{L}}$ as the parity-check matrix. In particular, we solve $\mathbb{H}^{\mathrm{L}}_{1}\tilde{\mathbb{c}}_{1}^{\mathrm{T}}=(0,0,0)^{\mathrm{T}}$ for $e_{1},e_{2},e_{3}$ and obtain $(e_{1},e_{2},e_{3})=(\beta,\beta^{14},\beta^{7})$ . We have decoded $\mathbb{c}_{1}$ successfully.

As an example of decoding by global access, suppose $\mathbb{c}^{\prime}_{1}=(e_{1},e_{2},\beta^{2},e_{3},e_{4},0)$ , and $\mathbb{c}_{2}$ has been locally decoded successfully. Then, $\mathbb{c}_{2}=(\beta,1,0,\beta^{6},0,\beta^{13})$ implies that $\mathbb{m}_{1}\mathbb{B}_{1,2}\mathbb{U}_{2}=(\beta^{6},0,\beta^{13})-\beta\cdot(\beta^{5},\beta^{12},\beta^{7})-1\cdot(1,\beta^{4},\beta^{11})=(1,\beta^{11},\beta^{5})$ . Since $\mathbb{U}_{2}=(\beta^{4},1,\beta^{9})$ , we obtain $\mathbb{m}_{1}\mathbb{B}_{1,2}=\beta^{11}$ . Moreover, we compute $\mathbb{m}_{2}\mathbb{B}_{2,1}\mathbb{U}_{1}=(\beta^{11},\beta^{7},\beta)$ . Let $\bar{\mathbb{c}}_{1}=\mathbb{c}^{\prime}_{1}-(0,0,0,\beta^{11},\beta^{7},\beta)=(e^{\prime}_{1},e^{\prime}_{2},\beta^{2},e^{\prime}_{3},e^{\prime}_{4},\beta)$ . Then, we solve $\mathbb{H}^{\mathrm{G}}_{1}\bar{\mathbb{c}}_{1}^{\mathrm{T}}=(0,0,0,\beta^{11})^{\mathrm{T}}$ and obtain $(e^{\prime}_{1},e^{\prime}_{2},e^{\prime}_{3},e^{\prime}_{4})=(1,\beta,\beta^{10},\beta^{7})$ . Therefore, $e_{1}=e^{\prime}_{1}=1$ , $e_{2}=e^{\prime}_{2}=\beta$ , $e_{3}=e^{\prime}_{3}+\beta^{11}=\beta^{14}$ , $e_{4}=e^{\prime}_{4}+\beta^{7}=0$ , and we have decoded $\mathbb{c}_{1}$ successfully.

III-B Codes with Hierarchical Locality

Based on the double-level accessible codes presented in Section III-A, we present a class of codes with hierarchical locality in 2. For simplicity, we just present a construction with triple-level access. Note that the coding scheme itself can be naturally extended to have more than three levels.

As described in 2, in the triple-level structure, the set of local clouds is partitioned into $p_{0}$ groups that are indexed by the first-level index $x\in\left[p_{0}\right]$ . These groups are further divided into $p_{1},p_{2},\dots,p_{p_{0}}$ local clouds, respectively, and the local clouds within group $x$ are indexed by the second-level index $i\in\left[p_{x}\right]$ . Therefore, each local cloud is indexed by the pair $(x,i)$ . In the following discussion, the parameters with subscript $(x,y;i,j)$ are determined via the two local clouds indexed by $(x,i)$ and $(y,j)$ . The subscript $(x,y;i)$ is an abbreviated version of $(x,y;i,1),(x,y;i,2),\dots,(x,y;i,p_{y})$ , and the parameters with subscript $(x,y;i)$ are determined via the local cloud $(x,i)$ and all the local clouds in the $y$ -th group. Lastly, we define a new notation, $(x,y;i;s)$ , that indexes the parameters determined via the local cloud $(x,i)$ and some other local clouds in the $y$ -th group (not necessarily all of them). Note that this notation bares similarity to $(x,y;i,j)$ . However, they are different notations: the index $s$ indexes a subgroup of local clouds not a single one as done by $j$ .

A generator matrix of such a code is as follows:

[TABLE]

where for any $x\in\left[p_{0}\right]$ ,

[TABLE]

is a generator matrix of a code offering double-level access, and

[TABLE]

Properties of $\mathbb{F}_{x,x},\mathbb{F}_{x,y}$ are to be discussed later.

Construction 2.

Let $p_{0}\in\mathbb{N}$ , $\mathbb{p}=(p_{1},\dots,p_{p_{0}})\in\mathbb{N}^{p_{0}}$ . Let $k_{x,i},n_{x,i},\delta_{x,i},\gamma_{x}\in\mathbb{N}$ , for $x\in\left[p_{0}\right]$ and $i\in\left[p_{x}\right]$ , such that $r_{x,i}=n_{x,i}-k_{x,i}>0$ and $2\gamma_{x}<\min\nolimits_{i\in\left[p_{x}\right]}\{r_{x,i}-\delta_{x,i}\}$ . Let $\delta_{x}=\delta_{x,1}+\cdots+\delta_{x,p_{x}}$ , $\gamma=\sum\nolimits_{x\in\left[p_{0}\right]}p_{x}\gamma_{x}$ , for all $x\in\left[p_{0}\right]$ . Let $GF(q)$ be a finite field such that $q\geq\max\limits_{x\in\left[p_{0}\right],i\in\left[p_{x}\right]}\{n_{x,i}+\delta_{x}-(p_{x}-2)\gamma_{x}+\gamma\}$ .

Let $u_{x,i}=k_{x,i}+\delta_{x,i}+2\gamma_{x}$ , $v_{x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}-p_{x}\gamma_{x}+\gamma$ , for $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ . For each $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ , let $a_{x,i,s},b_{x,i,t}$ , $s\in\left[u_{x,i}\right]$ , $t\in\left[v_{x,i}\right]$ , be distinct elements of $\textup{GF}(q)$ .

Consider the Cauchy matrix $\mathbb{T}_{x,i}$ on $\textup{GF}(q)^{u_{x,i}\times v_{x,i}}$ such that $\mathbb{T}_{x,i}=\mathbb{Y}(a_{x,i,1},\dots,a_{x,i,u_{x,i}};b_{x,i,1},\dots,b_{x,i,v_{x,i}})$ , for $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ . Then, we obtain $\mathbb{A}_{x,x;i,i}$ , $\mathbb{B}_{x,x;i,i^{\prime}}$ , $\mathbb{E}_{x,y;i;j}$ , $\mathbb{U}_{x,i}$ , $\mathbb{V}_{x,i}$ , $x\in\left[p_{0}\right]$ , $i^{\prime}\in\left[p_{x}\right]\setminus\{i\}$ , $y\in\left[p_{0}\right]\setminus\{x\}$ , $j\in\left[p_{y}\right]$ , according to the following partition of $\mathbb{T}_{x,i}$ ,

[TABLE]

such that $\mathbb{A}_{x,x;i,i}\in\textup{GF}(q)^{k_{x,i}\times r_{x,i}}$ , $\mathbb{B}_{x,x;i,i^{\prime}}\in\textup{GF}(q)^{k_{x,i}\times\delta_{x,i^{\prime}}}$ , $\mathbb{E}_{x,y;i;j}\in\textup{GF}(q)^{k_{x,i}\times\gamma_{y}}$ , $\mathbb{U}_{x,i}\in\textup{GF}(q)^{\delta_{x,i}\times r_{x,i}}$ , $\mathbb{V}_{x,i}\in\textup{GF}(q)^{2\gamma_{x}\times r_{x,i}}$ . Moreover, $\mathbb{A}_{x,x;i,i^{\prime}}=\mathbb{B}_{x,x;i,i^{\prime}}\mathbb{U}_{x,i^{\prime}}$ . Suppose $\mathbb{E}_{x,y;i;p_{y}+1}=\mathbb{E}_{x,y;i;1}$ ; let $\mathbb{A}_{x,y;i,j}=\left[\mathbb{E}_{x,y;i;j},\mathbb{E}_{x,y;i;j+1}\right]\mathbb{V}_{y,j}$ .

Matrices $\mathbb{A}_{x,x;i,i}$ and $\mathbb{A}_{x,y;i,j}$ are substituted in $\mathbb{F}_{x,x}$ and $\mathbb{F}_{x,y}$ to construct $\mathbb{G}$ as specified in (6), (7), and (8). Let $\mathcal{C}_{2}$ represent the code with generator matrix $\mathbb{G}$ .

Theorem 1.

Following the notation in 2, let $d_{1,x,i}=r_{x,i}-\delta_{x,i}-2\gamma_{x}+1$ , $d_{2,x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}+1$ , $d_{3,x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}-p_{x}\gamma_{x}+\gamma+1$ , for $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ . Then, the code $\mathcal{C}_{2}$ defined in 2 is an $(\mathbb{n},\mathbb{k},\mathbb{D},p_{0},\mathbb{p})_{q}$ -code.

Sketch of the proof.

For each $x\in\left[p_{0}\right]$ and $i\in\left[p_{x}\right]$ , define the local cross parity $\mathbb{y}_{x,i}=\sum\nolimits_{i^{\prime}\in\left[p_{x}\right]\setminus\{i\}}\mathbb{m}_{x,i^{\prime}}\mathbb{B}_{x,x;i,i^{\prime}}$ , and the global cross parities $\mathbb{\mathbb{z}}_{x,i}=\sum\nolimits_{y\in\left[p_{0}\right]\setminus\{x\},j\in\left[p_{y}\right]}\mathbb{m}_{y,j}\mathbb{E}_{y,x;j;i}$ . Let $\mathbb{z}_{x,p_{x}+1}=\mathbb{z}_{x,p_{x}}$ . Then, it follows from $\mathbb{m}\mathbb{G}=\mathbb{c}$ that $\mathbb{c}_{x,i}=\left[\mathbb{m}_{x,i},\mathbb{w}_{x,i}\right]$ for some $\mathbb{w}_{x,i}=\mathbb{m}_{x,i}\mathbb{A}_{x,x;i,i}+\mathbb{y}_{x,i}\mathbb{U}_{x,i}+\left[\mathbb{z}_{x,i},\mathbb{z}_{x,i+1}\right]\mathbb{V}_{x,i}$ .

The local erasure-correction capability $d_{1,x,i}=r_{x,i}-\delta_{x,i}-2\gamma_{x}+1$ and the global erasure-correction capability $d_{3,x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}-p_{x}\gamma_{x}+\gamma+1$ can be easily derived by following the same logic used in the proof of Lemma 2. Therefore, we only need to prove that $d_{2,x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}+1$ .

To prove this statement, suppose all the local codewords in the $x$ -th group except for $\mathbb{c}_{x,i}$ are successfully decodable locally, for some $x\in\left[p_{0}\right]$ , $i\in\left[p_{x}\right]$ . In other words, for all $i^{\prime}\in\left[p_{x}\right]\setminus\{i\}$ , there are at most $d_{1,x,i^{\prime}}-1$ erasures in the corrupted version $\mathbb{c}_{x,i^{\prime}}$ of the local codeword. From the construction, we know that the row spaces of any two matrices from $\mathbb{A}_{x,x;i,i}$ , $\mathbb{U}_{x,i}$ , and $\mathbb{V}_{x,i}$ have no common elements except for the all zero vector. Therefore, for all $i^{\prime}\in\left[p_{x}\right]\setminus\{i\}$ , $\mathbb{m}_{x,i^{\prime}}$ , $\mathbb{y}_{x,i^{\prime}}$ , $\left[\mathbb{z}_{x,i^{\prime}},\mathbb{z}_{x,i^{\prime}+1}\right]$ , can all be derived from $\mathbb{c}_{x,i}$ . This implies that $\left[\mathbb{z}_{x,i},\mathbb{z}_{x,i+1}\right]$ is known and thus, the entire contribution of global cross parities can be removed. Namely, let $\tilde{\mathbb{c}}_{x,i^{\prime}}=\mathbb{c}_{x,i^{\prime}}-\left[\mathbb{0}_{k_{x,i^{\prime}}},\left[\mathbb{z}_{x,i^{\prime}},\mathbb{z}_{x,i^{\prime}+1}\right]\mathbb{V}_{x,i^{\prime}}\right]$ , for all $i^{\prime}\in\left[p_{x}\right]$ , then the message $\mathbb{m}_{x}\mathbb{F}_{x,x}=\tilde{\mathbb{c}}_{x}$ , where $\tilde{\mathbb{c}}_{x}=\left[\tilde{\mathbb{c}}_{x,1},\dots,\tilde{\mathbb{c}}_{x,p_{x}}\right]$ . Thus, from Lemma 2, $(r_{x,i}-\delta_{x,i}+\delta_{x})$ erasures in $\tilde{\mathbb{c}}_{x,i}$ are correctable. Therefore, $d_{2,x,i}=r_{x,i}-\delta_{x,i}+\delta_{x}+1$ . ∎

Remark 1.

Note that the constraint of $\gamma_{y}\in\mathbb{N}$ in 1 can be relaxed to $2\gamma_{y}\in\mathbb{N}$ if $p_{y}$ is even. In this case, we have $\mathbb{E}_{x,y;i;j}\in\textup{GF}(q)^{k_{x,i}\times 2\gamma_{y}}$ . Moreover, we need to modify the equation of $\mathbb{E}_{x,y;i}$ to be $\mathbb{E}_{x,y;i}=\left[\mathbb{E}_{x,y;i;1},\dots,\mathbb{E}_{x,y;i;p_{y}/2}\right]$ , and $\mathbb{A}_{x,y;i,j}=\mathbb{E}_{x,y;i;\lceil j/2\rceil}\mathbb{V}_{y,j}$ .

The following is a working example of 2. For simplicity, we let the middle code be the code presented in 3. However, the construction itself doesn’t impose any constraints on $r_{x,i}$ , $\delta_{x,i}$ , and $\gamma_{x}$ , except for $2\gamma_{x}<\min\nolimits_{y\in\left[p_{x}\right]}\{r_{x,y}-\delta_{x,y}\}$ .

Example 4.

Here, we build on 3 using the same $GF(q)$ . Let $p_{0}=2$ , $\mathbb{p}=(p_{1},p_{2})=(2,2)$ , $\gamma^{\prime}=\gamma_{1}=\gamma_{2}=1/2$ , $\gamma=p_{1}\gamma_{1}+p_{2}\gamma_{2}=2$ . Let $\mathbb{F}_{1,1}=\mathbb{F}_{2,2}=\mathbb{G}$ of 3. Then, $n=6$ , $r=3$ , $\delta^{\prime}=1$ , $\delta=2$ as in 3. Therefore, $d_{1}=r-\delta^{\prime}-2\gamma^{\prime}+1=3-1-2\cdot(1/2)+1=2$ , $d_{2}=r-\delta^{\prime}+\delta+1=5$ , $d_{3}=r-\delta^{\prime}+\delta-2\gamma^{\prime}+\gamma+1=6$ . We assume $\mathbb{T}_{x,i}$ , $x,i\in\left[2\right]$ , are all identical, then so are $\mathbb{V}_{x,i}$ and $\mathbb{E}_{x,y;i;1}$ , $x\neq y$ , $i\in\left[2\right]$ . Let these matrices be defined as follows:

[TABLE]

For simplicity, we abbreviate $\mathbb{E}_{x,y;i;1}$ as $\mathbb{E}$ . Note that here $p_{1}$ , $p_{2}$ are even; thus, the construction follows the modification described in 1. The components $\mathbb{A}_{x,y;i,j}$ are therefore all identical for $x,y,i,j\in\left[2\right]$ , $x\neq y$ , and are described as follows:

[TABLE]

Then, the generator matrix is given in (12).

Note that the decoding process based on local access and global access have already been introduced in 3. Thus, we only focus on decoding based on the middle-level access in this example. Suppose $\mathbb{m}_{1,1}=(1,\beta,\beta^{2})$ , $\mathbb{m}_{1,2}=(\beta,1,0)$ , $\mathbb{m}_{2,1}=(\beta^{2},0,\beta)$ , $\mathbb{m}_{2,2}=(0,\beta,1)$ . Then, $\mathbb{c}_{1,1}=(1,\beta,\beta^{2},\beta^{12},\beta^{14},\beta^{12})$ , $\mathbb{c}_{1,2}=(\beta,1,0,\beta^{9},\beta^{14},\beta)$ .

Suppose there are $3$ erasures in $\mathbb{c}_{1,1}$ so that $\mathbb{c}^{\prime}_{1,1}=(e_{1},\beta,\beta^{2},e_{2},e_{3},\beta^{12})$ , where $e_{1},e_{2},e_{3}$ represent the three erased symbols. Suppose $\mathbb{c}_{1,2}$ is successfully corrected by local access. Then, codeword $\mathbb{c}_{1,1}$ is correctable through middle-level access, i.e., by operating on $\mathbb{c}^{\prime}_{1,1}$ and $\mathbb{c}_{1,2}$ .

First, from $\mathbb{c}_{1,2}=(\beta,1,0,\beta^{9},\beta^{14},\beta)$ , we know that $\mathbb{m}_{1,2}=(\beta,1,0)$ . Following the proof of 1, we know that $(\beta^{9},\beta^{14},\beta)=\mathbb{m}_{1,2}\mathbb{A}_{1,1;1,2}+\mathbb{y}_{1,2}\mathbb{U}_{1,2}+\mathbb{z}_{1,2}\mathbb{V}_{1,2}$ . Here, $\mathbb{y}_{1,1}=\mathbb{m}_{1,1}\mathbb{B}_{1,1;1,2}$ , $\mathbb{z}_{1,2}=(\mathbb{m}_{2,1}+\mathbb{m}_{2,2})\mathbb{E}=\mathbb{z}_{1,1}$ . Then, $\mathbb{y}_{1,2}$ and $\mathbb{z}_{1,2}$ can be computed as $\mathbb{y}_{1,2}=(\beta^{11})$ , $\mathbb{z}_{1,2}=(\beta^{4})$ . Therefore, $\mathbb{z}_{1,1}\mathbb{V}_{1,1}+\mathbb{m}_{1,2}\mathbb{A}_{1,1;2,1}=\mathbb{z}_{1,2}\mathbb{V}_{1,1}+\mathbb{m}_{1,2}\mathbb{A}_{1,1;2,1}=(\beta^{5},\beta^{14},\beta^{12})+(\beta^{11},\beta^{7},\beta)=(\beta^{3},\beta,\beta^{13})$ .

Let $\tilde{\mathbb{c}}_{1,1}=\mathbb{c}^{\prime}_{1,1}-(0,0,0,\beta^{3},\beta,\beta^{13})=(e^{\prime}_{1},\beta,\beta^{2},e^{\prime}_{2},e^{\prime}_{3},\beta)$ . We obtain $(e^{\prime}_{1},e^{\prime}_{2},e^{\prime}_{3})=(1,\beta^{10},\beta^{7})$ by solving $\mathbb{H}^{\mathrm{G}}_{1}\tilde{\mathbb{c}}_{1,1}^{\mathrm{T}}=(0,0,0,e^{11})^{\mathrm{T}}$ , where $\mathbb{H}^{\mathrm{G}}_{1}$ is specified in 3. Therefore, $e_{1}=e^{\prime}_{1}=1$ , $e_{2}=e^{\prime}_{2}+\beta^{3}=\beta^{12}$ , $e_{3}=e^{\prime}_{3}+\beta=\beta^{14}$ . We have successfully decoded $\mathbb{c}_{1,1}$ .

IV Scalability, Heterogeneity, and Flexibility

In Section III, we have presented a construction of codes with hierarchical locality for cloud storage, which enables the system to offer multi-level access. However, multi-level accessibility is not the only property that is considered in practical cloud storage applications. In this section, we therefore discuss scalability, heterogeneity, and flexibility of our construction, which are pivotal particularly in dynamic cloud storage. Although our discussion is restricted to cloud storage, the properties of heterogeneity and flexibility are also of practical importance in non-volatile memories.

IV-A Scalability

As discussed in Section I, scalability refers to the capability of expanding the backbone network to accommodate additional workload without rebuilding the entire infrastructure. More specifically, when a new local cloud is added to the existing configuration, computing a completely different generator matrix resulting in changing all the encoding-decoding components in the system is very costly. The ideal scenario is that adding a new local cloud does not change the encoding-decoding components of the already-existing, local clouds.

We show that our construction naturally achieves this goal. Observe that in 1, the components $\mathbb{A}_{x,x}$ , $\mathbb{U}_{x}$ , $\mathbb{B}_{x,i}$ , $i\in\left[p\right]\setminus\{x\}$ are built locally. Suppose cloud $p+1$ is added into a double-level configuration adopting 1. The following steps will only result in adding some columns and rows to the original $\mathbb{G}$ without changing the existing ones:

Parameter Selection: Local cloud $p+1$ chooses its local parameters $\mathbb{A}_{p+1,p+1}$ , $\mathbb{U}_{p+1}$ , $\mathbb{B}_{p+1,i}$ , $i\in\left[p\right]$ , and local cloud $i$ chooses the additional local parameters $\mathbb{B}_{i,p+1}$ ; 2. 2.

Information Exchange: Local cloud $p+1$ sends $\mathbb{m}_{p+1}\mathbb{B}_{p+1,i}$ to the central cloud, and local cloud $i$ sends $\mathbb{m}_{i}\mathbb{B}_{i,p+1}$ to the central cloud; 3. 3.

Information Exchange: The central cloud forwards $\mathbb{m}_{p+1}\mathbb{B}_{p+1,i}$ to local cloud $i$ , and sends $\mathbb{y}_{p+1}=\sum\nolimits_{i\in\left[p\right]}\mathbb{m}_{i}\mathbb{B}_{i,p+1}$ to local cloud $p+1$ ; 4. 4.

Update: Local cloud $p+1$ computes its finalized parity-check symbols $\mathbb{m}_{p+1}\mathbb{A}_{p+1,p+1}+\mathbb{y}_{p+1}\mathbb{U}_{p+1}$ , and local cloud $i$ adds $\mathbb{m}_{p+1}\mathbb{B}_{p+1,i}$ to its current parity symbols.

Note that although the local erasure-correction capability of a local cloud does not change, the global erasure-correction capability of each local cloud increases by $\delta_{p+1}$ after adding the new local cloud $p+1$ into the system.

IV-B Heterogeneity

While codes with identical data length and locality have been intensively studied, heterogeneity has become increasingly important in real world applications, especially in cloud storage. There are typically two forms of heterogeneity: the heterogeneity of the network structure, and unequal usage rates (according to how hot the data stored are) of local components. It is reasonable to assume a heterogeneous structure since components connected to a larger network are typically geographically separated and they often store data from unrelated sources. Heterogeneous networks naturally require codes with different local code lengths and nonidentical data lengths, corresponding to flexible $n_{x}$ and $k_{x}$ in our construction, respectively. Unequal protection of data, corresponding to flexible $r_{x}$ and $\delta_{x}$ , also has received increasing attention in recent years. This observation is reasonable since the usage rate of the data is not necessarily identical. Clouds storing hot data (data with higher usage rate and more time urgency) should receive more local protection than those store cold data.

Although the examples we presented in Section III have identical local parameters among all the clouds for simplicity, 1 and 2 do not impose such restrictions, and they are actually suitable for heterogeneous configuration.

Example 5.

Here, we build on 2 and we use the same parameters. In this example, $n_{x}$ , $k_{x}$ , $\delta_{x}$ are not identical for all $x$ . Let $(\delta_{1,1},\delta_{1,2})=(1,1)$ ; thus, $\delta_{1}=1+1=2$ . Let $(\delta_{2,1},\delta_{2,2})=(1,1)$ ; thus, $\delta_{2}=1+1=2$ . Let $(\delta_{3,1},\delta_{3,2},\delta_{3,3},\delta_{3,4})=(1,2,1,1)$ ; thus, $\delta_{3}=1+2+1+1=5$ .

Let $\gamma_{1}=1$ and $\gamma_{2}=\gamma_{3}=1/2$ ; thus, $\gamma=2\cdot(1)+2\cdot(1/2)+4\cdot(1/2)=5$ .

Then, $d_{1,1,1}=r_{1,1}-\delta_{1,1}-2\gamma_{1}+1=4-1-2\cdot 1+1=2$ ; $d_{2,1,1}=r_{1,1}-\delta_{1,1}+\delta_{1}+1=4-1+2+1=6$ ; $d_{3,1,1}=r_{1,1}-\delta_{1,1}+\delta_{1}-p_{1}\gamma_{1}+\gamma+1=4-1+2-2\cdot 1+5+1=9$ . The rest of the parameters can be obtained in a similar fashion, and we then specify $\mathbb{D}$ as follows:

[TABLE]

According to 2, one can construct an $(\mathbb{n},\mathbb{k},\mathbb{D},p_{0},\mathbb{p})_{q}$ -code with the parameters specified previously.

IV-C Flexibility

The concept of flexibility has been originally proposed and investigated for dynamic cloud storage in [8]. In a dynamic cloud storage system, the usage rate of a piece of data is not likely to remain unchanged. When the data stored in a local cloud become hot, splitting the local cloud into two smaller clouds effectively reduces the latency. However, this action should be done without reducing the erasure-correction capability of the rest of the system or changing the remaining components.

Take 1 as an example, if the data stored in local cloud $1$ becomes unexpectedly hot, then the following procedure splits it into two separate clouds $1^{\mathrm{a}}$ and $1^{\mathrm{b}}$ :

Select the desired local parameters $(k_{1}^{\mathrm{a}},r_{1}^{\mathrm{a}},\delta_{1}^{\mathrm{a}})$ and $(k_{1}^{\mathrm{b}},r_{1}^{\mathrm{b}},\delta_{1}^{\mathrm{b}})$ for clouds $1^{\mathrm{a}}$ and $1^{\mathrm{b}}$ , respectively, such that $k_{1}^{\mathrm{a}}+k_{1}^{\mathrm{b}}=k_{1}$ , $r_{1}^{\mathrm{a}}+r_{1}^{\mathrm{b}}=r_{1}$ , $\delta_{1}^{\mathrm{a}}+\delta_{1}^{\mathrm{b}}=\delta_{1}$ , and

[TABLE] 2. 2.

Compute $\mathbb{y}_{1}$ by solving the equation $\mathbb{y}_{1}\mathbb{U}_{1}=\mathbb{c}_{1}-\mathbb{m}_{1}\mathbb{A}_{1,1}$ , where $\mathbb{y}_{i}$ , $i\in\left[p\right]$ , are described in the proof of Lemma 2. Find $\mathbb{y}_{1}^{\mathrm{a}}\in\textup{GF}(q)^{\delta_{1}^{\mathrm{a}}}$ , $\mathbb{y}_{1}^{\mathrm{b}}\in\textup{GF}(q)^{\delta_{1}^{\mathrm{b}}}$ such that $\mathbb{y}_{1}=\left[\mathbb{y}_{1}^{\mathrm{a}},\mathbb{y}_{1}^{\mathrm{b}}\right]$ ; 3. 3.

Compute $\mathbb{c}_{1}^{\mathrm{a}}=\left[\mathbb{m}_{1}^{\mathrm{a}},\mathbb{m}_{1}^{\mathrm{a}}\mathbb{A}_{1^{\mathrm{a}},1^{\mathrm{a}}}+\left(\mathbb{m}_{1}^{\mathrm{b}}\mathbb{B}_{1^{\mathrm{b}},1^{\mathrm{a}}}+\mathbb{y}_{1}^{\mathrm{a}}\right)\mathbb{U}_{1}^{\mathrm{a}}\right]$ , and $\mathbb{c}_{1}^{\mathrm{b}}=\left[\mathbb{m}_{1}^{\mathrm{b}},\mathbb{m}_{1}^{\mathrm{b}}\mathbb{A}_{1^{\mathrm{b}},1^{\mathrm{b}}}+\left(\mathbb{m}_{1}^{\mathrm{a}}\mathbb{B}_{1^{\mathrm{a}},1^{\mathrm{b}}}+\mathbb{y}_{1}^{\mathrm{b}}\right)\mathbb{U}_{1}^{\mathrm{b}}\right]$ .

Note that the matrix $\mathbb{B}_{1,i}$ is vertically split into $\mathbb{B}_{1^{\textup{a}},i}$ and $\mathbb{B}_{1^{\textup{b}},i}$ , while $\mathbb{B}_{i,1}$ is horizontally split into $\mathbb{B}_{i,1^{\textup{a}}}$ and $\mathbb{B}_{i,1^{\textup{b}}}$ , for all $2\leq i\leq p$ . Therefore, it is obvious that $\mathbb{m}_{1}\mathbb{B}_{1,i}=\mathbb{m}_{1^{\textup{a}}}\mathbb{B}_{1^{\textup{a}},i}+\mathbb{m}_{1^{\textup{b}}}\mathbb{B}_{1^{\textup{b}},i}$ and one can prove that the local codeword $\mathbb{c}_{i}$ doesn’t change for $2\leq i\leq p$ . Moreover, since both the local and the global parity check matrices for each non-split local cloud remain unchanged, the local and global erasure capability are not affected according to Lemma 2. Furthermore, one can prove that the local codewords stored in the new clouds $1^{\mathrm{a}}$ and $1^{\mathrm{b}}$ such that they are capable of correcting $(r_{1}^{\mathrm{a}}-\delta_{1}^{\mathrm{a}})$ and $(r_{1}^{\mathrm{b}}-\delta_{1}^{\mathrm{b}})$ local erasures, respectively.

V Conclusion

Multi-level accessible codes have been shown to be beneficial for cloud storage. While the previous literature works was typically focused on double-level accessible codes and their erasure-correction capabilities, in this paper, we focus on codes with hierarchical locality and additional properties motivated by their practical importance. We proposed a CRS-based code on a finite field with size that grows linearly with the maximum local codelength. We showed that our construction achieves scalability, heterogeneity and flexibility, which are important in dynamic cloud storage.

Acknowledgment

This work has received funding from NSF under the grants CCF-BSF 1718389 and CCF 1717602.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Huang, E. Yaakobi, and P. H. Siegel, “Multi-erasure locally recoverable codes over small fields,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) . IEEE, 2017, pp. 1123–1130.
2[2] S. Ballentine, A. Barg, and S. Vladuts, “Codes with hierarchical locality from covering maps of curves,” ar Xiv preprint ar Xiv:1807.05473 , 2018.
3[3] Y. Cassuto, E. Hemo, S. Puchinger, and M. Bossert, “Multi-block interleaved codes for local and global read access,” in Proc. IEEE Int. Symp. Inf. Theory , 2017, pp. 1758–1762.
4[4] M. Hassner, K. Abdel-Ghaffar, A. Patel, R. Koetter, and B. Trager, “Integrated interleaving-a novel ECC architecture,” IEEE Transactions on Magnetics , vol. 37, no. 2, pp. 773–775, 2001.
5[5] M. Blaum and S. R. Hetzler, “Extended product and integrated interleaved codes,” IEEE Trans. Inf. Theory , vol. 64, no. 3, pp. 1497–1513, 2018.
6[6] X. Zhang, “Generalized three-layer integrated interleaved codes,” IEEE Communications Letters , vol. 22, no. 3, pp. 442–445, 2018.
7[7] Y. Wu, “Generalized integrated interleaved codes,” IEEE Transactions on Information Theory , vol. 63, no. 2, pp. 1102–1119, Nov. 2017.
8[8] U. Martnez-Penas and F. R. Kschischang, “Universal and dynamic locally repairable codes with maximal recoverability via sum-rank codes,” in 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) . IEEE, 2018, pp. 792–799.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Hierarchical Coding to Enable Scalability and Flexibility in Heterogeneous Cloud Storage

Abstract

I Introduction

II Notation and Preliminaries

II-A Notation and Definitions

Definition 1**.**

Example 1**.**

Definition 2**.**

Example 2**.**

II-B Cauchy Matrices

Definition 3**.**

Lemma 1**.**

Proof.

III Codes for Multi-Level Access

III-A Codes with Double-Level Access

Construction 1**.**

Lemma 2**.**

Sketch of the proof.

Example 3**.**

III-B Codes with Hierarchical Locality

Construction 2**.**

Theorem 1**.**

Sketch of the proof.

Remark 1**.**

Example 4**.**

IV Scalability, Heterogeneity, and Flexibility

IV-A Scalability

IV-B Heterogeneity

Example 5**.**

IV-C Flexibility

V Conclusion

Acknowledgment

Definition 1.

Example 1.

Definition 2.

Example 2.

Definition 3.

Lemma 1.

Construction 1.

Lemma 2.

Example 3.

Construction 2.

Theorem 1.

Remark 1.

Example 4.

Example 5.