A Systematic Construction of MDS Codes With Small Sub-packetization   Level and Near-Optimal Repair Bandwidth

Jie Li; Yi Liu; and Xiaohu Tang

arXiv:1901.08254·cs.IT·November 24, 2020

A Systematic Construction of MDS Codes With Small Sub-packetization Level and Near-Optimal Repair Bandwidth

Jie Li, Yi Liu, and Xiaohu Tang

PDF

TL;DR

This paper introduces a transformation technique to construct high-rate MDS codes with small sub-packetization levels and near-optimal repair bandwidth, making them more practical for real-world systems.

Contribution

It presents a novel transformation that reduces sub-packetization levels of MDS codes while maintaining near-optimal repair bandwidth, along with explicit code constructions.

Findings

01

Four high-rate MDS codes with small sub-packetization and near-optimal repair bandwidth are constructed.

02

Three of the codes are explicit with small field sizes, making implementation feasible.

03

An additional explicit MDS code is proposed with a smaller finite field requirement.

Abstract

In the literature, all the known high-rate MDS codes with the optimal repair bandwidth possess a significantly large sub-packetization level, which may prevent the codes to be implemented in practical systems. To build MDS codes with small sub-packetization level, existing constructions and theoretical bounds imply that one may sacrifice the optimality of the repair bandwidth. Partly motivated by the work of Tamo et al. (IEEE Trans. Inform. Theory, 59(3), 1597-1616, 2013), in this paper, we present a transformation that can greatly reduce the sub-packetization level of MDS codes with the optimal repair bandwidth with respect to the same code length n. As applications of the transformation, four high-rate MDS codes having both small sub-packetization level and near-optimal repair bandwidth can be obtained, where three of them are explicit and the required field sizes are around or even…

Tables5

Table 1. TABLE I: (a) and (b) denote the m + 1 𝑚 1 m+1 partitions of the set { e 0 , ⋯ , e r m − 1 } subscript 𝑒 0 ⋯ subscript 𝑒 superscript 𝑟 𝑚 1 \{e_{0},\cdots,e_{r^{m}-1}\} defined by ( 7 ) and ( 8 ) for m = 3 , r = 2 formulae-sequence 𝑚 3 𝑟 2 m=3,r=2 , and m = 2 , r = 3 formulae-sequence 𝑚 2 𝑟 3 m=2,r=3 , respectively.

$i$	0	1	2	*	$i$	0	1	2	*
$V_{i, 0}$	$e_{0}$	$e_{0}$	$e_{0}$	$e_{0}$	$V_{i, 1}$	$e_{4}$	$e_{2}$	$e_{1}$	$e_{1}$
	$e_{1}$	$e_{1}$	$e_{2}$	$e_{3}$		$e_{5}$	$e_{3}$	$e_{3}$	$e_{2}$
	$e_{2}$	$e_{4}$	$e_{4}$	$e_{5}$		$e_{6}$	$e_{6}$	$e_{5}$	$e_{4}$
	$e_{3}$	$e_{5}$	$e_{6}$	$e_{6}$		$e_{7}$	$e_{7}$	$e_{7}$	$e_{7}$
(A)

Table 2. TABLE II: A comparison of some key parameters among the ( n , k ) 𝑛 𝑘 (n,k) MDS codes proposed in this paper and some existing notable ( n , k ) 𝑛 𝑘 (n,k) MDS codes, where we set n = s n ′ 𝑛 𝑠 superscript 𝑛 ′ n=sn^{\prime} for convenience and r = n − k 𝑟 𝑛 𝑘 r=n-k

	Sub-packetization	Field size	The ratio of repair bandwidth	Remark	References
	level $N$	Field size	to the optimal value $γ^{*}$	Remark	References
The new MDS code $𝒞_{1}$	$r^{n^{'}}$	$q > r n^{'} ⌈ \frac{n}{r n^{'}} ⌉$ , $r \| (q - 1)$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$	Optimal update	Thms 3-5
The new MDS code $𝒞_{5}$	$r^{n^{'}}$	$q > r n^{'} ⌈ \frac{n}{r n^{'}} ⌉$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$	Optimal update	Thms 15, 16
The RTGE code 2	$O (r^{r τ} \log n)$	$O (n)$	$\leq 1 + \frac{1}{τ}$	$τ > 0$	[23]
The YB code 1	$r^{n}$	$q \geq r n$	$1$ (Optimal)	Optimal update	[12]
The new MDS code $𝒞_{2}$	$r^{n^{'} - 1}$	$q > r ⌈ \frac{n^{'}}{r} ⌉ (⌈ \frac{n}{n^{'}} ⌉ - 1) + n^{'}$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$		Thm 7
The new MDS code $𝒞_{3}$	$r^{n^{'} - 1}$	$\begin{matrix} q > s, q is odd, & if r is even \\ q > s r, & otherwise \end{matrix}$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$		Thm 8
The improved YB code 2	$r^{n - 1}$	$q > r$	$1$ (Optimal)		[15]
Shortened duplication-zigzag	$r^{n^{'} - 1}$	$q > s$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$		[3]
The new MDS code $𝒞_{4}$	$r^{\frac{n^{'}}{r + 1}}$	$\begin{matrix} q > \frac{2 n}{3}, & if r = 2 \\ q > N (\frac{n - 1}{r - 1}) + 1, & if r > 2 \end{matrix}$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{r}{n^{'}}$	Implicit when $r > 2$	Thms 11, 12
The RTGE code 1	$r^{τ}$	$q > n^{(r - 1) N + 1}$	$= 1 + \frac{(s - 1) (r - 1)}{n - 1} < 1 + \frac{1}{τ}$	$\begin{matrix} τ is an integer \\ 1 \leq τ \leq ⌈ \frac{n}{r} ⌉ - 1 \end{matrix}$	[23]
Long code $𝒞_{4}^{'}$	$r^{\frac{n}{r + 1}}$	$q > N (\frac{n - 1}{r - 1}) + 1$	$1$ (Optimal)	Implicit when $r > 2$	Thms 9, 10

Table 3. TABLE III: A comparison of some key parameters among the MDS codes 𝒞 4 subscript 𝒞 4 \mathcal{C}_{4} and the RTGE code 1 under some specific code lengths for r = 2 𝑟 2 r=2

	Code length	Number of	Sub-packetization	Field size	$\frac{γ}{γ^{*}}$
	$n$	parties $r$	level $N$	$q$	$\frac{γ}{γ^{*}}$
The new MDS code $𝒞_{4}$	$12$	$2$	$2^{2}$	$3^{2}$	$1 + \frac{1}{11}$
	$18$	$2$	$2^{2}$	$13$	$1 + \frac{2}{17}$
	$24$	$2$	$2^{2}$	$17$	$1 + \frac{3}{23}$
The RTGE code 1	$12$	$2$	$2^{3}$	$> 10^{9}$	$1 + \frac{1}{11}$
	$18$	$2$	$2^{3}$	$> 10^{11}$	$1 + \frac{2}{17}$
	$24$	$2$	$2^{3}$	$> 10^{12}$	$1 + \frac{3}{23}$

Table 4. TABLE IV: A comparison of some key parameters among the MDS codes 𝒞 4 subscript 𝒞 4 \mathcal{C}_{4} and the RTGE code 1 under some specific code lengths for r = 3 𝑟 3 r=3

	Code length	Number of	Sub-packetization	Field size	$\frac{γ}{γ^{*}}$	Remark
	$n$	parties $r$	level $N$	$q$	$\frac{γ}{γ^{*}}$	Remark
The new MDS code $𝒞_{4}$	$24$	$3$	$3^{3}$	$> 6831$	$1 + \frac{1}{23}$	Implicit construction
The new MDS code $𝒞_{4}$	$36$	$3$	$3^{3}$	$> 16065$	$1 + \frac{2}{35}$	Implicit construction
The RTGE code 1	$24$	$3$	$3^{4}$	$> 10^{224}$	$1 + \frac{1}{23}$
The RTGE code 1	$36$	$3$	$3^{4}$	$> 10^{253}$	$1 + \frac{2}{35}$

Table 5. TABLE V: A comparison of some key parameters among the MDS codes 𝒞 4 subscript 𝒞 4 \mathcal{C}_{4} and the RTGE code 1 under some specific code lengths for r = 4 𝑟 4 r=4

	Code length	Number of	Sub-packetization	Field size	$\frac{γ}{γ^{*}}$	Remark
	$n$	parties $r$	level $N$	$q$	$\frac{γ}{γ^{*}}$	Remark
The new MDS code $𝒞_{4}$	$40$	$4$	$4^{4}$	$> 2339584$	$1 + \frac{1}{13}$	Implicit construction
The new MDS code $𝒞_{4}$	$60$	$4$	$4^{4}$	$> 8322304$	$1 + \frac{6}{59}$	Implicit construction
The RTGE code 1	$40$	$4$	$4^{5}$	$> 10^{4923}$	$1 + \frac{1}{13}$
The RTGE code 1	$60$	$4$	$4^{5}$	$> 10^{5464}$	$1 + \frac{6}{59}$

Equations326

γ (d) \geq γ^{*} (d) ≜ \frac{d}{d - k + 1} N,

γ (d) \geq γ^{*} (d) ≜ \frac{d}{d - k + 1} N,

\underbrace{\left(\begin{array}[]{cccc}A_{0,0}&A_{0,1}&\cdots&A_{0,n-1}\\ A_{1,0}&A_{1,1}&\cdots&A_{1,n-1}\\ \vdots&\vdots&\ddots&\vdots\\ A_{r-1,0}&A_{r-1,1}&\cdots&A_{r-1,n-1}\end{array}\right)}_{A}\left(\begin{array}[]{c}\mathbf{f}_{0}\\ \mathbf{f}_{1}\\ \vdots\\ \mathbf{f}_{n-1}\end{array}\right)=\mathbf{0}_{rN},

\underbrace{\left(\begin{array}[]{cccc}A_{0,0}&A_{0,1}&\cdots&A_{0,n-1}\\ A_{1,0}&A_{1,1}&\cdots&A_{1,n-1}\\ \vdots&\vdots&\ddots&\vdots\\ A_{r-1,0}&A_{r-1,1}&\cdots&A_{r-1,n-1}\end{array}\right)}_{A}\left(\begin{array}[]{c}\mathbf{f}_{0}\\ \mathbf{f}_{1}\\ \vdots\\ \mathbf{f}_{n-1}\end{array}\right)=\mathbf{0}_{rN},

A = (A_{t, i})_{t \in [0, r), i \in [0, n)}

A = (A_{t, i})_{t \in [0, r), i \in [0, n)}

A_{t, i} = A_{i}^{t}, t \in [0, r), i \in [0, n)

A_{t, i} = A_{i}^{t}, t \in [0, r), i \in [0, n)

\underbrace{\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right)\mathbf{f}_{i}}_{\mathrm{useful~{}data}}+\sum_{j=0,j\neq i}^{n-1}\underbrace{\left(\begin{array}[]{c}S_{i,0}A_{0,j}\\ S_{i,1}A_{1,j}\\ \vdots\\ S_{i,r-1}A_{r-1,j}\end{array}\right)\mathbf{f}_{j}}_{\mathrm{interference~{}by~{}}\mathbf{f}_{j}}=\mathbf{0},

\underbrace{\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right)\mathbf{f}_{i}}_{\mathrm{useful~{}data}}+\sum_{j=0,j\neq i}^{n-1}\underbrace{\left(\begin{array}[]{c}S_{i,0}A_{0,j}\\ S_{i,1}A_{1,j}\\ \vdots\\ S_{i,r-1}A_{r-1,j}\end{array}\right)\mathbf{f}_{j}}_{\mathrm{interference~{}by~{}}\mathbf{f}_{j}}=\mathbf{0},

\textrm{rank}(\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right))=N,\,i\in[0,n),

\textrm{rank}(\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right))=N,\,i\in[0,n),

\mbox{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,0}A_{0,j}\\ S_{i,1}A_{1,j}\\ \vdots\\ S_{i,r-1}A_{r-1,j}\end{array}\right))=\mbox{rank}\left(R_{i,j}\right),

\mbox{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,0}A_{0,j}\\ S_{i,1}A_{1,j}\\ \vdots\\ S_{i,r-1}A_{r-1,j}\end{array}\right))=\mbox{rank}\left(R_{i,j}\right),

\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))=\mbox{rank}(R_{i,j})

\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))=\mbox{rank}(R_{i,j})

γ_{i} = j = 0, j \neq = i \sum n - 1 rank (R_{i, j}) = j = 0, j \neq = i \sum n - 1 β_{i, j} .

γ_{i} = j = 0, j \neq = i \sum n - 1 rank (R_{i, j}) = j = 0, j \neq = i \sum n - 1 β_{i, j} .

e_{i} = (0, \dots, 0, 1, 0, \dots, 0), i \in [0, r^{m}),

e_{i} = (0, \dots, 0, 1, 0, \dots, 0), i \in [0, r^{m}),

V_{i, t} = {e_{a} ∣ a_{i} = t, 0 \leq a < r^{m}},

V_{i, t} = {e_{a} ∣ a_{i} = t, 0 \leq a < r^{m}},

V_{*, t} = {e_{a} ∣ a_{0} + a_{1} + \dots + a_{m - 1} = t, 0 \leq a < r^{m}},

V_{*, t} = {e_{a} ∣ a_{0} + a_{1} + \dots + a_{m - 1} = t, 0 \leq a < r^{m}},

V_{i + s m, t} = V_{i, t}, i \in [0, m), s \geq 1, \mbox an d t \in [0, r) .

V_{i + s m, t} = V_{i, t}, i \in [0, m), s \geq 1, \mbox an d t \in [0, r) .

V_{i_{1}, i_{2}, t_{1}, t_{2}}

V_{i_{1}, i_{2}, t_{1}, t_{2}}

V_{i_{1}, t_{1}} = V_{i_{1}, i_{2}, t_{1}, 0} \cup \dots \cup V_{i_{1}, i_{2}, t_{1}, r - 1} .

V_{i_{1}, t_{1}} = V_{i_{1}, i_{2}, t_{1}, 0} \cup \dots \cup V_{i_{1}, i_{2}, t_{1}, r - 1} .

V_{1, 0} = (e_{0}^{⊤} e_{1}^{⊤} e_{4}^{⊤} e_{5}^{⊤})^{⊤},

V_{1, 0} = (e_{0}^{⊤} e_{1}^{⊤} e_{4}^{⊤} e_{5}^{⊤})^{⊤},

A_{t, j}

A_{t, j}

R_{i, j}

S_{i, t} = S_{i % n^{'}, t}^{'}

S_{i, t} = S_{i % n^{'}, t}^{'}

f (s_{1}, \dots, s_{n}) \neq = 0.

f (s_{1}, \dots, s_{n}) \neq = 0.

\gamma_{i}=\left\{\begin{array}[]{ll}(1+\frac{(\lceil\frac{n}{n^{\prime}}\rceil-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ if\ \ }0\leq i\%n^{\prime}<n\%n^{\prime},\\ (1+\frac{(\lfloor\frac{n}{n^{\prime}}\rfloor-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ otherwise}.\end{array}\right.

\gamma_{i}=\left\{\begin{array}[]{ll}(1+\frac{(\lceil\frac{n}{n^{\prime}}\rceil-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ if\ \ }0\leq i\%n^{\prime}<n\%n^{\prime},\\ (1+\frac{(\lfloor\frac{n}{n^{\prime}}\rfloor-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ otherwise}.\end{array}\right.

\textrm{rank}(\left(\begin{array}[]{c}S^{\prime}_{i,0}A^{\prime}_{0,i}\\ S^{\prime}_{i,1}A^{\prime}_{1,i}\\ \vdots\\ S^{\prime}_{i,r-1}A^{\prime}_{r-1,i}\end{array}\right))=N,~{}\mbox{for}~{}i\in[0,n^{\prime}),

\textrm{rank}(\left(\begin{array}[]{c}S^{\prime}_{i,0}A^{\prime}_{0,i}\\ S^{\prime}_{i,1}A^{\prime}_{1,i}\\ \vdots\\ S^{\prime}_{i,r-1}A^{\prime}_{r-1,i}\end{array}\right))=N,~{}\mbox{for}~{}i\in[0,n^{\prime}),

\textrm{rank}(\left(\begin{array}[]{c}R^{\prime}_{i,j}\\ S^{\prime}_{i,t}A^{\prime}_{t,j}\end{array}\right))=N/r,~{}i,j\in[0,n^{\prime})~{}\mbox{with}~{}i\neq j

\textrm{rank}(\left(\begin{array}[]{c}R^{\prime}_{i,j}\\ S^{\prime}_{i,t}A^{\prime}_{t,j}\end{array}\right))=N/r,~{}i,j\in[0,n^{\prime})~{}\mbox{with}~{}i\neq j

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right))

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}S_{i,0}A_{0,i}\\ S_{i,1}A_{1,i}\\ \vdots\\ S_{i,r-1}A_{r-1,i}\end{array}\right))

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))

=

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))

\displaystyle\textrm{rank}(\left(\begin{array}[]{c}R_{i,j}\\ S_{i,t}A_{t,j}\end{array}\right))

=

γ_{i}

γ_{i}

\left(\begin{array}[]{c}V_{i,0}\\ V_{i,1}\\ \vdots\\ V_{i,r-1}\end{array}\right)A^{\prime}_{i}=\left(\begin{array}[]{c}\lambda_{i,0}V_{i,0}\\ \lambda_{i,1}V_{i,1}\\ \vdots\\ \lambda_{i,r-1}V_{i,r-1}\end{array}\right),

\left(\begin{array}[]{c}V_{i,0}\\ V_{i,1}\\ \vdots\\ V_{i,r-1}\end{array}\right)A^{\prime}_{i}=\left(\begin{array}[]{c}\lambda_{i,0}V_{i,0}\\ \lambda_{i,1}V_{i,1}\\ \vdots\\ \lambda_{i,r-1}V_{i,r-1}\end{array}\right),

R_{i, j}^{'} = S_{i, t}^{'} = V_{i, 0} + V_{i, 1} + \dots + V_{i, r - 1}

R_{i, j}^{'} = S_{i, t}^{'} = V_{i, 0} + V_{i, 1} + \dots + V_{i, r - 1}

\gamma_{i}=\left\{\begin{array}[]{ll}(1+\frac{(\lceil\frac{n}{n^{\prime}}\rceil-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ if\ \ }0\leq i\%n^{\prime}<n\%n^{\prime},\\ (1+\frac{(\lfloor\frac{n}{n^{\prime}}\rfloor-1)(r-1)}{n-1})\gamma^{*},&\mbox{\ \ otherwise}.\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Systematic Construction of MDS Codes

With Small Sub-packetization Level and

Near-Optimal Repair Bandwidth

Jie Li, Yi Liu, and Xiaohu Tang

The work of J. Li was supported in part by the National Science Foundation of China under Grant 61801176. The work of Y. Liu and X. Tang was supported in part by the National Natural Science Foundation of China under Grant 61871331 and Grant 61941106. This paper was presented in part at the 2019 IEEE International Symposium on Information Theory.J. Li is with the Department of Mathematics and Systems Analysis, Aalto University, FI-00076 Aalto, Finland, and also with the Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China (e-mail: [email protected], [email protected]).Y. Liu and X. Tang are with the Information Security and National Computing Grid Laboratory, Southwest Jiaotong University, Chengdu, 610031, China (e-mail: [email protected]; [email protected]).

Abstract

In the literature, all the known high-rate MDS codes with the optimal repair bandwidth possess a significantly large sub-packetization level, which may prevent the codes to be implemented in practical systems. To build MDS codes with small sub-packetization level, existing constructions and theoretical bounds imply that one may sacrifice the optimality of the repair bandwidth. Partly motivated by the work of Tamo et al. (IEEE Trans. Inform. Theory, 59(3), 1597-1616, 2013), in this paper, we present a transformation that can greatly reduce the sub-packetization level of MDS codes with the optimal repair bandwidth with respect to the same code length $n$ . As applications of the transformation, four high-rate MDS codes having both small sub-packetization level and near-optimal repair bandwidth can be obtained, where three of them are explicit and the required field sizes are around or even smaller than the code length $n$ . Additionally, we propose another explicit MDS code which has a similar structure as that of the first resultant code obtained by the generic transformation, but can be built on a smaller finite field.

Index Terms:

Distributed storage, high-rate, MDS codes, sub-packetization, repair bandwidth.

I Introduction

In distributed storage systems such as Hadoop Distributed File System (HDFS) and Google File System (GFS), redundancy is imperative to ensure reliability. An attractive solution is to call upon the maximum distance separable (MDS) codes, which provide the optimal tradeoff between fault tolerance and storage overhead. By distributing the codeword across distinct storage nodes, in the case of node failures, the missing data can be recovered from the data at some surviving nodes, named helper nodes as well. For this scenario, one of the most important parameters is the repair bandwidth, which is defined as the amount of data downloaded from the helper nodes to repair the failed node. Particularly, Dimakis et al. [1] derived a lower bound on the repair bandwidth of MDS codes, which motivated abundant recent research in coding for distributed storage [16, 11, 2, 7, 4, 6, 14, 3, 9, 8, 5, 12, 13, 10, 17, 18, 15, 19, 20].

In the literature, most existing MDS codes with the repair bandwidth achieving the lower bound in [1] are a kind of array codes. A codeword of an $(n,k)$ array code is an $N\times n$ matrix, where the parameter $N$ is called the sub-packetization level and $n$ is called the code length. When deploying an array code to a distributed storage system, a code symbol (i.e., a column) corresponds to a storage node. Then, an array code is said to have the MDS property if any $k$ out of the $n$ columns of the matrix can recover the remaining $n-k$ columns. It was proved in [1] that the repair bandwidth $\gamma(d)$ of an $(n,k)$ MDS array code with sub-packetization level $N$ should satisfy

[TABLE]

where $d$ ( $k\leq d\leq n-1$ ) is the number of helper nodes. An MDS array code is said to have the optimal repair bandwidth if $\gamma(d)=\gamma^{*}(d)$ , i.e., the amount of data downloaded from each helper node is $\frac{N}{d-k+1}$ . In the particular case, when $d=n-1$ , $\gamma^{*}(d)$ can be reduced to the minimal value $\frac{n-1}{n-k}N$ . Therefore, $d=n-1$ is the main concern in the most known works [7, 4, 3, 6, 10, 9, 8, 5, 11]. In this paper, we also follow the same setting and thus abbreviate $\gamma^{*}(n-1)$ to $\gamma^{*}$ . Especially, we focus on MDS array codes, and refer to them as MDS codes for simplicity.

Up to now, various MDS code constructions with the optimal repair bandwidth have been proposed, among which some notable works are [16, 2, 4, 14, 11, 3, 12, 13, 15, 5, 17, 18]. However, in the high-rate regime, all the known $(n,k)$ MDS code constructions with the optimal repair bandwidth possess a significantly large sub-packetization level $N$ , i.e., $N\geq r^{n\over r+1}$ where $r=n-k$ [9]. In [21], it was shown that for an $(n,k)$ MDS code with the optimal repair bandwidth, a sub-packetization level $N$ being exponential in the square root of $k$ is necessary. Very recently, Alrabiah and Guruswami [22] further improved the lower bound on $N$ to being exponential in $k$ and they conjectured that the construction in [9] with $N=r^{n\over r+1}$ is exactly tight. An MDS code with a larger sub-packetization level can lead to a reduced design space in terms of various system parameters and make management of meta-data difficult. Moreover, the implementation in practical systems is a big challenge [23].

Existing constructions and theoretical bounds imply that one may construct high-rate MDS codes with small sub-packetization level by sacrificing the optimality of the repair bandwidth. In [23], two high-rate $(n,k)$ MDS codes with small sub-packetization level were presented. The first one can have a sub-packetization level as small as $N=r^{\tau}$ where $r=n-k$ and $\tau$ is a positive integer with $1\leq\tau\leq\lceil\frac{n}{r}\rceil-1$ , while the repair bandwidth is no larger than $(1+\frac{1}{\tau})\gamma^{*}$ . However, the code is constructed over a significantly large finite field $\textbf{F}_{q}$ with $q>n^{(r-1)N+1}$ , which may hinder its deployment in practical systems. The second MDS code is obtained by combining an MDS code with the optimal repair bandwidth and another error-correcting code with specific parameters. The proposed codes, therefore, rely on the existence of the latter, which may not always be available. For convenience, we refer to the two codes in [23] as RTGE code 1 and RTGE code 2 in this paper. In [3], an $(n=sk^{\prime}+2,k=sk^{\prime})$ MDS code with sub-packetization level $2^{k^{\prime}-1}$ and near-optimal repair bandwidth only for systematic nodes was proposed, which is termed duplication-zigzag code in this paper. In fact, the duplication-zigzag code is constructed based on $s$ -duplication of the $(k^{\prime}+2,k^{\prime})$ zigzag code, but can only support two parity nodes.

In this paper, we aim to construct high-rate MDS codes that have both small sub-packetization level and near-optimal repair bandwidth for general parameters $n$ and $k$ over a small finite field $\textbf{F}_{q}$ . Partly motivated by the work in [3], we present a transformation that can convert any $(n^{\prime}=k^{\prime}+r,k^{\prime})$ MDS code with the optimal repair bandwidth that is defined in the parity-check matrix form into another $(n=k+r,k)$ MDS code with much longer code length. Specifically, the repair bandwidth of the new MDS code is upper bounded by $(1+\frac{r}{n^{\prime}})\gamma^{*}$ , but the sub-packetization level is kept unchanged, or equivalently the generic transformation can reduce the sub-packetization level $N$ of the original codes with respect to the same code length $n$ . By directly applying the generic transformation to several known high-rate MDS codes with the optimal repair bandwidth, we get four high-rate $(n,k)$ MDS codes with both small sub-packetization level $N$ and near-optimal repair bandwidth, three of which are explicit and the required field sizes are around or smaller than the code length $n$ . Besides, we propose another new MDS code which has a similar structure as that of the first resultant code obtained by the generic transformation, but can be built on a smaller finite field. The obtained MDS codes outperform the RTGE code 1 in [23] in terms of the field size, and the first codes in both [12] and [15] as well as the RTGE code 2 in [23] in terms of the sub-packetization level. As a matter of convenience, we refer to the first two codes in [12] respectively as YB code 1 and YB code 2, while referring to the first code in [15] as the improved YB code 2 (since it is an improvement of the YB code 2 in [12] with respect to the field size).

The remainder of the paper is organized as follows. Section II reviews some necessary preliminaries. Section III proposes the generic transformation and its asserted properties. Section IV demonstrates several applications of the generic transformation, three of which are explicit. Section V presents another new explicit construction of high-rate MDS code over a small finite field that has a small sub-packetization level, near-optimal repair bandwidth, and the optimal update property. Section VI gives comparisons of key parameters among the MDS codes proposed in this paper and some existing notable MDS codes. Finally, Section VII concludes the study.

II Preliminaries

In this section, we introduce some preliminaries on high-rate MDS codes, and a series of special partitions for a given basis set.

II-A $(n,k)$ * MDS codes*

Denote by $q$ a prime power and $\mathbf{F}_{q}$ the finite field with $q$ elements. For any two integers $a$ and $b$ with $b>a$ , denote by $[a,b)$ the set $\{a,a+1,\ldots,b-1\}$ . Let $\mathbf{f}_{0},\mathbf{f}_{1},\ldots,\mathbf{f}_{n-1}$ be the data stored across a distributed storage system consisting of $n$ nodes based on an $(n,k)$ MDS code, where $\mathbf{f}_{i}$ is a column vector of length $N$ over $\mathbf{F}_{q}$ . Throughout this paper, we consider $(n,k)$ MDS codes that permit a definition in the following parity-check form:

[TABLE]

where $r=n-k\geq 2$ , $\mathbf{0}_{rN}$ denotes the zero column vector of length $rN$ , and will be abbreviated as $\mathbf{0}$ in the sequel if its length is clear. The $rN\times nN$ block matrix $A$ in (2) is called the parity-check matrix of the code, which can be written as

[TABLE]

to indicate the block entries.

For every $t\in[0,r)$ , by (2), we have $\sum\limits_{i=0}^{n-1}A_{t,i}\mathbf{f}_{i}=\mathbf{0}$ , which contains $N$ linear equations. Particularly, we say that $\sum\limits_{i=0}^{n-1}A_{t,i}\mathbf{f}_{i}=\mathbf{0}$ is the $t$ -th parity-check group.

II-B The MDS property

An $(n,k)$ MDS code defined by (2) possesses the MDS property that the source file can be reconstructed by connecting to any $k$ out of the $n$ nodes. That is, any $r\times r$ sub-block matrix of $(A_{t,i})_{t\in[0,r),i\in[0,n)}$ is nonsingular [12].

In particular, if

[TABLE]

for some matrices $A_{i}$ of order $N$ , then we have the following result.

Lemma 1 ([12]).

An $(n,k)$ code defined by (2) and (3) has the MDS property if $A_{i}A_{j}=A_{j}A_{i}$ and $A_{i}-A_{j}$ is nonsingular for all $i,j\in[0,n)$ with $i\neq j$ .

II-C Repair

When repairing a failed node $i$ ( $i\in[0,n)$ ) of an $(n,k)$ MDS code, denote by $\beta_{i,j}$ the amount of data downloaded from node $j$ , where $j\in[0,n)\backslash\{i\}$ . In fact, the data downloaded from helper node $j$ can be represented by $R_{i,j}\mathbf{f}_{j}$ , where $R_{i,j}$ is a $\beta_{i,j}\times N$ matrix of full rank. Throughout this paper, $R_{i,j}$ is called the repair matrix of node $i$ .

Clearly, a failed node can be repaired if there are $N$ linearly independent equations with respect to the $N$ unknowns of $\mathbf{f}_{i}$ . Specially, the $N$ equations should be chosen elaborately so that the interference in these equations can be cancelled by the downloaded data $R_{i,j}\mathbf{f}_{j}$ from the helper nodes $j\in[0,n)\backslash\{i\}$ . In this paper, similar to that in [15], for convenience, we only consider the symmetric situation where appropriate $N/r$ linearly independent equations are acquired from each of the $r$ parity-check groups, which are linear combinations of the corresponding $N$ parity-check equations. Precisely, these $N/r$ linearly independent equations can be obtained by multiplying the $t$ -th parity-check group with an $N/r\times N$ matrix $S_{i,t}$ of full rank, where $S_{i,t}$ is called the select matrix in [15]. As a consequence, the following linear equations are available.

[TABLE]

thus regenerating node $i$ requires that

(i)

the coefficient matrix of the useful data is of full rank, i.e.,

[TABLE]

(ii)

the interference caused by $\mathbf{f}_{j}$ can be determinable by the data $R_{i,j}\mathbf{f}_{j}$ downloaded from node $j$ for all $j\in[0,n)\backslash\{i\}$ , i.e.,

[TABLE]

for $i,j\in[0,n)$ with $i\neq j$ , which means that

[TABLE]

for $i,j\in[0,n)$ with $i\neq j$ , $t\in[0,r)$ .

Then, the repair bandwidth of node $i$ is

[TABLE]

As mentioned before, a lower repair bandwidth of a node is desirable. According to (1), if $\gamma_{i}=\gamma^{*}=(n-1){N\over r}$ , then node $i$ is said to have the optimal repair bandwidth. If $\gamma_{i}\leq(1+\epsilon)\gamma^{*}=(1+\epsilon)(n-1){N\over r}$ for a small constant $\epsilon$ , then node $i$ is said to have the near-optimal repair bandwidth [23].

In addition to the (near-) optimal repair bandwidth, an $(n,k)$ MDS code is also preferred to have the optimal update property, that is, the minimum number of elements need to be updated when an information element is changed. In [12], Ye and Barg showed that an $(n,k)$ MDS code defined in the form of (2) and (3) has the optimal update property if all the block matrices of the parity-check matrix are diagonal.

II-D Partition of basis $\{e_{0},\cdots,e_{N-1}\}$

Assuming that $N=r^{m}$ for two integers $r$ and $m$ with $r,m\geq 2$ , let $e_{0},\cdots,e_{r^{m}-1}$ be a basis of $\mathbf{F}_{q}^{r^{m}}$ . For example, they can be simply set as the standard basis, i.e.,

[TABLE]

with only the $i$ th entry being nonzero.

In [11], a series of special partitions of the set $\{e_{0},\cdots,e_{r^{m}-1}\}$ is given for $r=2$ . These set partitions can be easily generalized to the case of $r\geq 2$ , which will play an important role in our proposed new constructions.

For consistency, we follow the notation in [11] hereafter. Given an integer $0\leq a<r^{m}$ , denote by $(a_{0},\cdots,a_{m-1})$ its $r$ -ary expansion, i.e., $a=\sum\limits_{j=0}^{m-1}r^{m-1-j}a_{j}$ . For $0\leq i<m$ and $0\leq t<r$ , define a subset of $\{e_{0},\cdots,e_{r^{m}-1}\}$ as

[TABLE]

where $a_{i}$ is the $i$ th element in the $r$ -ary expansion of $a$ . Moreover, for $0\leq t<r$ , we define a special subset of $\{e_{0},\cdots,e_{r^{m}-1}\}$ as

[TABLE]

where $a_{0}+a_{1}+\cdots+a_{m-1}$ is computed modulo $r$ . This special subset will be used in the MDS code construction in Section IV-B.

Straightforwardly, $|V_{i,t}|=r^{m-1}$ , and $\{V_{i,0},V_{i,1},\cdots,V_{i,r-1}\}$ is a partition of the set $\{e_{0},\cdots,e_{r^{m}-1}\}$ for any $i\in[0,m)\cup\{*\}$ . Table I gives two examples of the set partitions defined in (7) and (8).

Based on the $m$ set partitions in (7), let us define

[TABLE]

Further, for any $0\leq i_{1},i_{2}<sm$ and $i_{1}\not\equiv i_{2}\mbox{\ mod\ }m$ , we define $V_{i_{1},i_{2},t_{1},t_{2}}=V_{i_{2},i_{1},t_{2},t_{1}}=V_{i_{1},t_{1}}\cap V_{i_{2},t_{2}}$ , i.e.,

[TABLE]

where $0\leq t_{1},t_{2}<r$ . Then, we have

[TABLE]

For the easy of notation, we also denote by $V_{i_{1},t_{1}}$ and $V_{i_{1},i_{2},t_{1},t_{2}}$ the $r^{m-1}\times r^{m}$ and $r^{m-2}\times r^{m}$ matrices, whose rows are formed by vectors $e_{i}$ in their corresponding sets, respectively, such that $i$ is sorted in ascending order. For example, when $r=2$ and $m=3$ , $V_{1,0}$ can be viewed as a $4\times 8$ matrix as follows

[TABLE]

where $\top$ represents the transpose operator.

III A generic transformation

In this section, we present a generic transformation that can convert any MDS code with the optimal repair bandwidth defined in the form of (2) to a new MDS code with longer code length and near-optimal repair bandwidth.

**A generic transformation: **The transformation can be performed through the following two steps.

Step 1. Choose an $(n^{\prime},k^{\prime})$ MDS code with the optimal repair bandwidth as the base code

We choose an $(n^{\prime},k^{\prime})$ MDS code in the form of (2), with the optimal repair bandwidth over a finite field containing at least $q^{\prime}$ elements, as the base code. Let $N$ denote its sub-packetization level, $r=n^{\prime}-k^{\prime}$ , and let $(A^{\prime}_{t,i})_{t\in[0,r),i\in[0,n^{\prime})}$ denote its parity-check matrix while the $N/r\times N$ matrices $R^{\prime}_{i,j}$ and $S^{\prime}_{i,t}$ , $i,j\in[0,n^{\prime})$ with $j\neq i$ , $t\in[0,r)$ , respectively denote the repair matrices and select matrices.

Step 2. Transform the base code to the new MDS code

Through the generic transformation, we intend to design a new $(n=k+r,k)$ MDS code over a certain finite field $\textbf{F}_{q}$ ( $q>q^{\prime}$ ) having arbitrary code length $n$ ( $n>n^{\prime}$ ) while maintaining the same sub-packetization level $N$ .

The transition from the base code to the new MDS code is done by designing the parity-check matrix, the repair matrices, and the select matrices of the new MDS code from those of the base code as follows.

[TABLE]

and

[TABLE]

where $x_{t,j}\in\mathbf{F}_{q}\backslash\{0\}$ , $t\in[0,r)$ , $i,j\in[0,n)~{}\mbox{with}~{}j\neq i$ , $\%$ denotes the modulo operation, and $I_{N}$ denotes the identity matrix of order $N$ , which will be abbreviated as $I$ in the sequel if its order is clear.

Remark 1.

For an $(n^{\prime},k^{\prime})$ MDS code defined over a finite field that contains at least $q^{\prime}$ elements, it can of course be defined over a larger finite field $\textbf{F}_{q}$ ( $q>q^{\prime}$ ). In the above generic transformation, the base code is then assumed to be defined over the same finite field $\textbf{F}_{q}$ of the resultant new code.

Like many MDS codes in the literature, the MDS property of the resultant code can be guaranteed by the Combinatorial Nullstellensatz in [24].

Lemma 2 (Theorem 1.2 of [24]).

Let $\mathbf{F}_{q}$ be an arbitrary field, and $f=f(x_{1},\cdots,x_{n})$ be a polynomial in $\mathbf{F}_{q}[x_{1},\cdots,x_{n}]$ . Suppose that the degree of $f$ is $\sum\limits_{i=1}^{n}t_{i}$ , where each $t_{i}$ is a nonnegative integer, and the coefficient of $\prod\limits_{i=1}^{n}x_{i}^{t_{i}}$ in $f$ is nonzero. Then, if $S_{1},\cdots,S_{n}$ are subsets of $\mathbf{F}_{q}$ with $|S_{i}|>t_{i}$ , there are $s_{1}\in S_{1},\cdots,s_{n}\in S_{n}$ so that

[TABLE]

Theorem 1.

The new $(n,k)$ code over $\mathbf{F}_{q}$ obtained by the generic transformation can possess the MDS property if

i)

$q>N{n-1\choose r-1}+1$ ,111Note that the field size required for the base code is $\geq q^{\prime}$ , therefore, $q$ should actually satisfy $q\geq\max\{q^{\prime},N{n-1\choose r-1}+2\}$ . However, the smallest field size required for any known explicit MDS code with the optimal repair bandwidth in the literature is far less than $N{n-1\choose r-1}+2$ . So, we make an assumption here that $q^{\prime}<N{n-1\choose r-1}+2$ . and

ii)

every block matrix $A^{\prime}_{t,j}$ of the parity-check matrix $(A^{\prime}_{t,j})_{t\in[0,r),j\in[0,n^{\prime})}$ of the base code is nonsingular.

Proof.

The proof is given in Appendix A. ∎

Remark 2.

To the best of our knowledge, there are only four classes of MDS codes with the optimal repair bandwidth that are defined in parity-check matrix form, where the requirement in Theorem 1-ii) can be satisfied for two of them, i.e., the YB code 2 in [12] and the improved YB code 2 in [15], while the remaining codes (i.e., the YB code 1 in [12] and the constructions in [13] and [14]) need a minor modification. As a concrete example, the YB code 1 in [12] satisfying this requirement will be illustrated in Section IV-A.

Theorem 2.

Every failed node of the new $(n,k)$ code obtained by the generic transformation can be regenerated by the repair matrices defined in (14), where the repair bandwidth for node $i$ ( $i\in[0,n)$ ) is

[TABLE]

Proof.

Since the $(n^{\prime},k^{\prime})$ base code possesses the optimal repair bandwidth, by (4) and (5), we have

[TABLE]

and

[TABLE]

for $t\in[0,r)$ .

For $i,j\in[0,n)$ with $j\neq i$ , we rewrite $i$ and $j$ as $i=un^{\prime}+i^{\prime}$ and $j=vn^{\prime}+j^{\prime}$ such that $i^{\prime},j^{\prime}\in[0,n^{\prime})$ . Firstly, we verify (4) for the new code. By (11) and (15),

[TABLE]

where the last equality follows from (16).

Next, we check (5) for the new code. When $i^{\prime}\neq j^{\prime}$ ,

[TABLE]

where the second and third equalities follows from (17) and (14), respectively. When $i^{\prime}=j^{\prime}$ , similarly, we have

[TABLE]

Therefore, according to (6), (III), and (III), the repair bandwidth of node $i$ is

[TABLE]

where $\gamma^{*}=(n-1)\frac{N}{r}$ is the optimal value for the repair bandwidth. This finishes the proof. ∎

Remark 3.

In fact, any $(n^{\prime},k^{\prime})$ MDS code without the optimal repair bandwidth can also be chosen as the base code in the generic transformation. Its repair bandwidth is $(n^{\prime}-1)\beta$ , i.e., a failed node can be regenerated by downloading an amount of $\beta$ symbols from each surviving node. Then the repair bandwidth of the resultant MDS code would be upper bounded by $(1+{(\lceil\frac{n}{n^{\prime}}\rceil-1)(N/\beta-1)\over(n-1)})(n-1)\beta$ according to a similar analysis as the proof of Theorem 2.

IV MDS code constructions by directly applying the generic transformation

In this section, by directly applying the generic transformation in Section III respectively to the $(n^{\prime},k^{\prime})$ YB codes 1 and 2 in [12], the $(n^{\prime},k^{\prime})$ improved YB code 2 in [15], and the counterpart of the long MSR code [9] in the parity-check form, we get four MDS codes with small sub-packetization level.

IV-A An $(n,k)$ MDS code $\mathcal{C}_{1}$ by applying the generic transformation to the YB code 1 in [12]

The $(n^{\prime},k^{\prime})$ YB code 1 was defined in [12] in the form of (2) and (3), with the optimal update property and the sub-packetization level being $N=r^{n^{\prime}}$ where $r=n^{\prime}-k^{\prime}$ . More precisely, the parity-check matrix $(A^{\prime}_{t,i})_{t\in[0,r),i\in[0,n^{\prime})}$ of the $(n^{\prime},k^{\prime})$ YB code 1 satisfies $A^{\prime}_{t,i}=(A^{\prime}_{i})^{t}$ and

[TABLE]

where $V_{i,0},V_{i,1},\cdots,V_{i,r-1}$ are defined in (7), $\{\lambda_{i,t}\}_{i\in[0,n^{\prime}),t\in[0,r)}$ are $rn^{\prime}$ distinct elements in a finite field containing at least $rn^{\prime}$ elements, the repair matrices and select matrices are defined by

[TABLE]

for $i,j\in[0,n^{\prime})$ with $j\neq i$ , $t\in[0,r)$ .

From (33), it is obvious that $A^{\prime}_{i}$ is nonsingular if and only if $\{\lambda_{i,t}\}_{t\in[0,r)}$ are $r$ nonzero elements. In order to meet Theorem 1-ii), i.e., in order for matrices in (33) to be invertible, we can add a restriction that $\{\lambda_{i,t}\}_{i\in[0,n^{\prime}),t\in[0,r)}$ are $rn^{\prime}$ nonzero elements when applying the generic transformation to YB code 1. Accordingly, the requirement of the field size $q$ of YB code 1 is then only increased from $q\geq rn^{\prime}$ to $q\geq rn^{\prime}+1$ , which can be easily satisfied as the resultant new code will be defined over a finite field with size larger than $rn^{\prime}$ .

Theorem 3.

By choosing the $(n^{\prime},k^{\prime})$ YB code 1 as the base code for the generic transformation in Section III, an $(n,k)$ MDS code $\mathcal{C}_{1}$ over $\mathbf{F}_{q}$ with $k=n-r$ and $q>N{n-1\choose r-1}+1$ can be obtained. Specifically, the sub-packetization level of the MDS code $\mathcal{C}_{1}$ is $r^{n^{\prime}}$ while its repair bandwidth for node $i$ ( $i\in[0,n)$ ) is

[TABLE]

For the MDS code $\mathcal{C}_{1}$ directly obtained by the generic transformation, the required field size is relatively large and the construction is implicit. In the following, through a concrete assignment of the coefficients $x_{t,j}$ , $t\in[0,r)$ and $j\in[0,n)$ in (11), we provide a solution to determine the exact field size of the MDS code $\mathcal{C}_{1}$ , which is much smaller than $N{n-1\choose r-1}+2$ .

Theorem 4.

The field size $q$ of the $(n,k)$ MDS code $\mathcal{C}_{1}$ can be reduced to

[TABLE]

with $r\mid(q-1)$ by setting

[TABLE]

in (33) and

[TABLE]

in (11) for $t\in[0,r)$ , $i=zrn^{\prime}+vn^{\prime}+i^{\prime}\in[0,n)$ , $z\in[0,\lceil\frac{n}{rn^{\prime}}\rceil)$ , $v\in[0,r)$ , and $i^{\prime}\in[0,n^{\prime})$ , where $c$ is a primitive element of the finite field $\mathbf{F}_{q}$ and $\delta=c^{\frac{q-1}{r}}$ , i.e., a primitive $r$ -th root of unity in the finite field $\mathbf{F}_{q}$ .

Proof.

Obviously, we only need to verify the MDS property of the code $\mathcal{C}_{1}$ . Note from (36) that $\mathcal{C}_{1}$ is defined in the form of (2) and (3), i.e.,

[TABLE]

for $i=zrn^{\prime}+vn^{\prime}+i^{\prime}$ and the matrix $A_{i}\triangleq c^{zn^{\prime}}\delta^{v}A^{\prime}_{i^{\prime}}$ . Then, by Lemma 1, the code $\mathcal{C}_{1}$ possesses the MDS property if $A_{i}A_{j}=A_{j}A_{i}$ and $A_{i}-A_{j}$ is nonsingular for all $i,j\in[0,n)$ with $i\neq j$ .

Firstly, from (33) and (37), it is seen that $A_{i}$ is diagonal for $i\in[0,n)$ , then $A_{i}A_{j}=A_{j}A_{i}$ holds for any $i,j\in[0,n)$ with $i\neq j$ .

Secondly, we show that $A_{i}-A_{j}$ is nonsingular for all $i,j\in[0,n)$ with $i\neq j$ . Let $i=z_{0}rn^{\prime}+v_{0}n^{\prime}+i^{\prime}$ and $j=z_{1}rn^{\prime}+v_{1}n^{\prime}+j^{\prime}$ , where $i\neq j$ , $z_{0},z_{1}\in[0,\lceil\frac{n}{rn^{\prime}}\rceil)$ , $v_{0},v_{1}\in[0,r)$ , and $i^{\prime},j^{\prime}\in[0,n^{\prime})$ .

If $j\not\equiv i\bmod n^{\prime}$ , i.e., $i^{\prime}\neq j^{\prime}$ , then

[TABLE]

where the first, third, fourth, and fifth equalities follow from (37), (10), (33), and (35), respectively. Thus, $\mbox{rank}(A_{i}-A_{j})=N$ if and only if

[TABLE]

Note that (38) always holds, otherwise,

[TABLE]

for some $t_{0},t_{1}\in[0,r)$ . Raising both sides to the power of $r$ , by $\delta^{r}=1$ one then gets

[TABLE]

In the following, we prove that (39) does not hold, i.e.,

[TABLE]

Clearly,

[TABLE]

where $W=zrn^{\prime}+rw$ , $z=\lceil\frac{n}{rn^{\prime}}\rceil-1$ , $w=-1$ if $n\%(rn^{\prime})=1$ (in this case $zrn^{\prime}+w=n-2$ due to $j^{\prime}-i^{\prime}\neq 0$ ), $w=n\%n^{\prime}-1$ if $1<n\%(rn^{\prime})<n^{\prime}$ (in this case $zrn^{\prime}+w=n-1$ ), and $w=n^{\prime}-1$ else (in this case $zrn^{\prime}+w<n-1$ unless $n\%(rn^{\prime})=n^{\prime}$ ) , i.e.,

[TABLE]

which together with $r\mid(q-1)$ implies that (39) does not hold when (34) is satisfied.

If $j\equiv i\bmod n^{\prime}$ , i.e., $i^{\prime}=j^{\prime}$ , then

[TABLE]

therefore, $A_{i}-A_{j}$ is nonsingular if and only if

[TABLE]

since $A^{\prime}_{i^{\prime}}$ is nonsingular. Note that $z_{0},z_{1}\in[0,\lceil\frac{n}{rn^{\prime}}\rceil)$ , $v_{0},v_{1}\in[0,r)$ , and $(z_{0},v_{0})\neq(z_{1},v_{1})$ according to $i^{\prime}=j^{\prime}$ and $i\neq j$ , then we have

[TABLE]

thus (40) holds if $q-1>\left(\lceil\frac{n}{rn^{\prime}}\rceil-1\right)n^{\prime}+\frac{q-1}{r}(r-1)$ , i.e., $q>\left(\lceil\frac{n}{rn^{\prime}}\rceil-1\right)rn^{\prime}+r$ by combining $r\mid(q-1)$ .

This finishes the proof after combining the above analysis. ∎

In the following, we give a concrete example of the MDS code $\mathcal{C}_{1}$ according to Theorem 4.

Example 1.

Let $n^{\prime}=3$ , $r=2$ and $n=12$ , then the parity-check matrix of the $(12,10)$ MDS code $\mathcal{C}_{1}$ over $\mathbf{F}_{13}$ is defined through

[TABLE]

where $c=2$ and $\delta=c^{6}=-1$ .

To save space, we only give the repair matrices and select matrices of node 0, which are

[TABLE]

and

[TABLE]

Theorem 5.

The MDS code $\mathcal{C}_{1}$ has the optimal update property.

Proof.

Note that all the block matrices of the parity-check matrix of the MDS code $\mathcal{C}_{1}$ are diagonal. By the definition of the optimal update property and the arguments in [12], we conclude that the MDS code $\mathcal{C}_{1}$ has the optimal update property. ∎

IV-B Two $(n,k)$ MDS codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ by applying the generic transformation respectively to the YB code 2 in [12] and the improved YB code 2 in [15]

For consistency, we borrow the notation in [12] and [15] in what follows. Let $N=r^{n^{\prime}-1}$ where $r=n^{\prime}-k^{\prime}$ . For any $a\in[0,N)$ with $(a_{0},a_{1},\cdots,a_{n^{\prime}-2})$ being its $r$ -ary expansion, define

[TABLE]

and

[TABLE]

where $0\leq i<j<n^{\prime}-1$ and $u,v\in[0,r)$ .

For the $(n^{\prime},k^{\prime})$ YB code 2 in [12] and the $(n^{\prime},k^{\prime})$ improved YB code 2 in [15], both of them are defined in the form of (2) and (3) with the sub-packetization level $N$ . More precisely, the parity-check matrix $(A^{\prime}_{t,i})_{t\in[0,r),i\in[0,n^{\prime})}$ of the $(n^{\prime},k^{\prime})$ YB code 2 in [12] is defined by $A^{\prime}_{t,i}=(A^{\prime}_{i})^{t}$ and

[TABLE]

where

[TABLE]

with $c$ being a primitive element of a finite field with size larger than $n^{\prime}$ . While the parity-check matrix $(A^{\prime}_{t,i})_{t\in[0,r),i\in[0,n^{\prime})}$ of the $(n^{\prime},k^{\prime})$ improved YB code 2 in [15] is defined by $A^{\prime}_{t,i}=(A^{\prime}_{i})^{t}$ and

[TABLE]

where

[TABLE]

with $c$ being a primitive element of a finite field $\mathbf{F}_{q}$ with $(q-1)\nmid(r-1)$ .

The YB code 2 in [12] and the improved YB code 2 in [15] have the same repair matrices and select matrices, which are respectively defined by

[TABLE]

and

[TABLE]

where $V_{i,0}$ , $V_{*,0}$ and $V_{*,r-t}$ are defined in (7) and (8).

By directly applying the generic transformation in Section III, we have the following result.

Theorem 6.

Respectively choosing the $(n^{\prime},k^{\prime})$ YB code 2 in [12] and the $(n^{\prime},k^{\prime})$ improved YB code 2 in [15] as the base code for the generic transformation in Section III, two $(n,k)$ MDS codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ over $\mathbf{F}_{q}$ with $k=n-r$ and $q>N{n-1\choose r-1}+1$ can be obtained. Particularly, for both the MDS codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ , the sub-packetization level is $r^{n^{\prime}-1}$ while the repair bandwidth for node $i$ ( $i\in[0,n)$ ) is

[TABLE]

In the following, by a concrete assignment of the coefficients $x_{t,j}$ , $t\in[0,r)$ and $j\in[0,n)$ in (11), we provide a solution to determine the exact field sizes of the MDS codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ , which are much smaller than $N{n-1\choose r-1}+2$ . Hereafter, we only derive the values of $x_{t,j}$ , $t\in[0,r)$ and $j\in[0,n)$ in (11) for the MDS code $\mathcal{C}_{3}$ in detail, while for MDS code $\mathcal{C}_{2}$ , we just give the results but omit the analysis since it is similar to that of the MDS code $\mathcal{C}_{3}$ .

Theorem 7.

The field size $q$ of the MDS code $\mathcal{C}_{2}$ can be reduced to $q>r\lceil{n^{\prime}\over r}\rceil(\lceil{n\over n^{\prime}}\rceil-1)+n^{\prime}$ by setting $x_{t,i}=x_{i}^{t}=c^{\lfloor{i\over n^{\prime}}\rfloor\lceil{n^{\prime}\over r}\rceil t}$ in (11) for $t\in[0,r)$ and $i\in[0,n)$ , where $c$ is a primitive element of $\mathbf{F}_{q}$ .

Before proving the result on $\mathcal{C}_{3}$ , we first introduce some results related to the parity-check matrix (see (47)) of the $(n^{\prime},k^{\prime})$ improved YB code 2 in [15].

Lemma 3 (Lemma 2, [15]).

For any $i,j\in[0,n^{\prime})$ with $i\neq j$ , $A^{\prime}_{i}A^{\prime}_{j}=A^{\prime}_{j}A^{\prime}_{i}$ , where $A^{\prime}_{i}$ and $A^{\prime}_{j}$ are defined in (47).

Lemma 4 (Lemma 3, [15]).

For any $a\in[0,N)$ and $i,j\in[0,n^{\prime}-1)$ ,

(i)

$\prod\limits_{t=0}^{r-1}\lambda_{i,a(i,j,a_{i}-t,a_{j}+t+l)}=c$ * for $j>i$ ;*

(ii)

$\prod\limits_{t=0}^{r-1}\lambda_{j,a(i,j,a_{i}-t,a_{j}+t+l)}=1\mbox{~{}or~{}}c^{r}$ * for $j>i$ ;*

(iii)

$\prod\limits_{t=0}^{r-1}\lambda_{j,a(j,a_{j}+t)}=c$ * for $j\geq 0$ ,*

where $l\in[0,r)$ is a constant, $c$ is a primitive element of $\mathbf{F}_{q}$ , $a(i,j,u,v)$ and $\lambda_{i,a}$ are respectively defined in (46) and (48).

Lemma 5 (Lemma 4, [15]).

For any $i\in[0,n^{\prime}-1)$ and $X=\sum\limits_{a=0}^{N-1}x_{a}e_{a}^{\top}\in\mathbf{F}_{q}^{N}$ , $A^{\prime}_{i}X=\sum\limits_{a=0}^{N-1}\lambda_{i,a}x_{a(i,a_{i}+1)}e_{a}^{\top}$ where $A^{\prime}_{i}$ is defined in (47).

Theorem 8.

*The field size $q$ of the $(n,k)$ MDS code $\mathcal{C}_{3}$ can be reduced to $q>\lceil\frac{n}{n^{\prime}}\rceil$ with $q$ being odd if $r$ is even, and $q>r\lceil\frac{n}{n^{\prime}}\rceil$ otherwise, by setting *

[TABLE]

in (11) for $t\in[0,r)$ and $i\in[0,n)$ , where $c$ is a primitive element of $\mathbf{F}_{q}$ .

Proof.

Still, we only need to verify the MDS property of the code $\mathcal{C}_{3}$ . It is seen from (49) that the code $\mathcal{C}_{3}$ is defined in the form of (2) and (3) with

[TABLE]

That is

[TABLE]

and

[TABLE]

for $u\in[0,~{}\lceil\frac{n}{n^{\prime}}\rceil)$ and $i^{\prime}\in[0,~{}n^{\prime}-1)$ with $un^{\prime}+i^{\prime}<n$ . According to Lemma 1, the code $\mathcal{C}_{3}$ possesses the MDS property if $A_{i}A_{j}=A_{j}A_{i}$ and $A_{i}-A_{j}$ is nonsingular for all $i,j\in[0,n)$ with $i\neq j$ .

First, by Lemma 3, (51) and (52), we easily see that $A_{i}A_{j}=A_{j}A_{i}$ holds for any $i,j\in[0,n)$ with $i\neq j$ .

Next, we show that $A_{i}-A_{j}$ is nonsingular. Note that $A_{i}-A_{j}$ being nonsingular is equivalent to saying that for any $X=\sum\limits_{a=0}^{N-1}x_{a}e_{a}^{\top}$ , $(A_{i}-A_{j})X=\mathbf{0}$ implies $X=\mathbf{0}$ . In the following, we analyze it through three cases. For $i,j\in[0,n)$ with $i\neq j$ , let us rewrite $i=un^{\prime}+i^{\prime}$ and $j=vn^{\prime}+j^{\prime}$ for some $u,v\in[0,\lceil\frac{n}{n^{\prime}}\rceil)$ and $i^{\prime},j^{\prime}\in[0,n^{\prime})$ , where $(u,i^{\prime})\neq(v,j^{\prime})$ .

Case 1: If $i\equiv j~{}\bmod n^{\prime}$ , i.e., $i^{\prime}=j^{\prime}$ and $u\neq v$ , then by (50), we have

[TABLE]

which is nonsingular since $0<|u-v|\leq\lceil\frac{n}{n^{\prime}}\rceil-1<q-1$ .

Case 2: If $i\not\equiv j~{}\bmod n^{\prime}$ , $i^{\prime}\neq n^{\prime}-1$ , and $j^{\prime}\neq n^{\prime}-1$ , then by Lemma 5, we have

[TABLE]

if and only if

[TABLE]

which is equivalent to

[TABLE]

Applying Lemma 4 to (53), if $j^{\prime}>i^{\prime}$ , we get

[TABLE]

or

[TABLE]

otherwise, we have

[TABLE]

or

[TABLE]

If $r$ is even, then $ru-rv+1$ , $rv-ru+r-1$ , $rv-ru+1$ , and $ru-rv+r-1$ is odd, thus

[TABLE]

when $q$ is odd; Otherwise, for any

[TABLE]

we have

[TABLE]

when $q>r\lceil\frac{n}{n^{\prime}}\rceil$ , i.e.,

[TABLE]

when $q>r\lceil\frac{n}{n^{\prime}}\rceil$ . Hence, if $q$ is odd and $r$ is even, or $q>r\lceil\frac{n}{n^{\prime}}\rceil$ and $r$ is odd, we have that

[TABLE]

thus $x_{a}=0$ for all $a\in[0,N)$ , i.e., $X=0$ . Then, $A_{i}-A_{j}$ is nonsingular.

Case 3: If $i\not\equiv j~{}\bmod n^{\prime}$ and either $i^{\prime}=n^{\prime}-1$ or $j^{\prime}=n^{\prime}-1$ , W.L.O.G., assuming that $i^{\prime}=n^{\prime}-1$ , then $j^{\prime}\neq n^{\prime}-1$ . Similar to Case 2, we have

[TABLE]

which in conjunction with Lemma 4, we have

[TABLE]

for all $a\in[0,N)$ . This implies that $x_{a}=0$ for all $a\in[0,N)$ by a similar analysis as in Case 2, i.e., $X=\mathbf{0}$ . Thus, $A_{i}-A_{j}$ is nonsingular.

Collecting the above three cases, we finish the proof. ∎

Let us see to what extent the field size $q$ of the $(n,k)$ MDS code $\mathcal{C}_{3}$ can be reduced by Theorem 8. For example, when $n^{\prime}=12$ , $r=3$ , and $n=24$ . According to Theorem 8, we can set $x_{t,i}=x_{i}^{t}=2^{\lfloor{i\over 12}\rfloor t}$ in (11) over $\mathbf{F}_{7}$ for $t\in[0,3)$ and $i\in[0,24)$ , where $2$ is a primitive element of $\mathbf{F}_{7}$ . Whereas, by Theorem 6, the existence of the MDS code $\mathcal{C}_{3}$ requires a finite field with size larger than $4\times 10^{7}$ .

IV-C An $(n,k)$ MDS code $\mathcal{C}_{4}$ obtained by applying the generic transformation to a newly constructed MDS code $\mathcal{C}^{\prime}_{4}$

In this section, by using the approach of [12], we first construct an $\left(n^{\prime}=\left(r+1\right)m,k^{\prime}=n^{\prime}-r\right)$ MDS code $\mathcal{C}^{\prime}_{4}$ with sub-packetization level $r^{m}$ , and then propose an $(n,k)$ MDS code $\mathcal{C}_{4}$ with small sub-packetization level by applying the generic transformation to the code $\mathcal{C}^{\prime}_{4}$ . In fact, the code $\mathcal{C}^{\prime}_{4}$ can be viewed as an extension of the $(n^{\prime}=rm,k^{\prime}=r(m-1))$ MDS code in [13] with a longer code length. Besides, $\mathcal{C}^{\prime}_{4}$ in parity-check form can also be regarded as the counterpart of the $(n^{\prime}=k^{\prime}+r,k^{\prime}=(r+1)m)$ long minimum storage regenerating (MSR) code [9] in systematic form. For simplicity, we call $\mathcal{C}^{\prime}_{4}$ the long code in this paper. In the following, we give the parity-check matrix, repair matrices and select matrices of the long code $\mathcal{C}^{\prime}_{4}$ .

The parity-check matrix $(A^{\prime}_{t,i^{\prime}})_{t\in[0,r),i^{\prime}\in[0,n^{\prime})}$ of the $(n^{\prime}=(r+1)m,k^{\prime}=n^{\prime}-r)$ long code $\mathcal{C}^{\prime}_{4}$ satisfies

[TABLE]

and (55) in the next page,

where $y_{t,i^{\prime}},\lambda_{i^{\prime},u}\in\mathbf{F}_{q^{\prime}}\backslash\{0\}$ for $i^{\prime}\in[0,n^{\prime})$ and $t,u\in[0,r)$ , $V_{i^{\prime},0},\ldots,V_{i^{\prime},r-1}$ are respectively defined by (7) for $i^{\prime}\in[0,m)$ and (9) for $i^{\prime}\in[m,n^{\prime})$ , i.e.,

[TABLE]

for $i^{\prime}\in[0,n^{\prime})$ and $v,t\in[0,r)$ . The repair matrices and select matrices of the $(n^{\prime},k^{\prime})$ MDS code $\mathcal{C}^{\prime}_{4}$ are respectively defined by

[TABLE]

for $j^{\prime}\in[0,n^{\prime})\backslash\{i^{\prime}\}$ and $t\in[0,r)$ .

Obviously, $B^{\prime}_{t,i^{\prime}}$ is nonsingular for $t\in[0,r)$ and $i^{\prime}\in[0,n^{\prime})$ according to (55). Then we have the following result.

Theorem 9.

The code $\mathcal{C}^{\prime}_{4}$ has the MDS property over $\mathbf{F}_{q^{\prime}}$ if $q^{\prime}>N{n^{\prime}-1\choose r-1}+1$ .

Proof.

It can be proven similar to that of Theorem 1. ∎

Theorem 10.

The code $\mathcal{C}^{\prime}_{4}$ has the optimal repair bandwidth if $\lambda_{i^{\prime},0},\lambda_{i^{\prime},1},\cdots,\lambda_{i^{\prime},r-1}$ are $r$ distinct elements in $\mathbf{F}_{q^{\prime}}$ for any $i^{\prime}\in[0,n^{\prime})$ .

Proof.

The proof is given in Appendix B. ∎

Based on the long code $\mathcal{C}^{\prime}_{4}$ , we have the following result by directly applying the generic transformation.

Theorem 11.

By applying the generic transformation in Section III to the $(n^{\prime},k^{\prime})$ long code $\mathcal{C}^{\prime}_{4}$ , an $(n,k)$ MDS code $\mathcal{C}_{4}$ over $\mathbf{F}_{q}$ with $k=n-r$ and $q>N{n-1\choose r-1}+1$ can be obtained. Specifically, the sub-packetization level of the MDS code $\mathcal{C}_{4}$ is $r^{n^{\prime}\over{r+1}}$ while its repair bandwidth for node $i$ ( $i\in[0,n)$ ) is

[TABLE]

In what follows, we present a solution to determine the exact field size of the MDS code $\mathcal{C}_{4}$ for the case of $r=2$ , which is much smaller than $N{n-1\choose r-1}+2$ .

By (11) and (54), the parity-check matrix $(A_{t,i})_{t\in[0,r),i\in[0,n)}$ of the $(n,k)$ MDS code $\mathcal{C}_{4}$ satisfies

[TABLE]

where

[TABLE]

Then we have the following result.

Theorem 12.

When $r=2$ , the field size $q$ of the $(n,k)$ MDS code $\mathcal{C}_{4}$ can be reduced to

[TABLE]

by setting

[TABLE]

for $t=0,1$ , $i\in[0,n)$ and

[TABLE]

in (55) for $i^{\prime}\in[0,m)$ , where $n^{\prime}=3m$ and $c$ is a primitive element of $\mathbf{F}_{q}$ .

Proof.

According to (55), the code $\mathcal{C}_{4}$ has the MDS property if and only if any $2\times 2$ sub-block matrix of

[TABLE]

is nonsingular, i.e, $A_{1,i}-A_{1,j}$ is nonsingular for any $i,j\in[0,n)$ with $i\neq j$ . Let us rewrite $i=un^{\prime}+i^{\prime}$ and $j=vn^{\prime}+j^{\prime}$ for some $u,v\in[0,\lceil\frac{n}{n^{\prime}}\rceil)$ and $i^{\prime},j^{\prime}\in[0,n^{\prime})$ , where $(u,i^{\prime})\neq(v,j^{\prime})$ . In the following, we analyze the nonsingularity of $A_{1,i}-A_{1,j}$ in the following 6 cases according to (60)-(64).

Case 1: When $0\leq i^{\prime}=j^{\prime}<3m$ , then

[TABLE]

which always holds since

[TABLE]

Case 2: When $0\leq i^{\prime}<j^{\prime}<m$ , then

[TABLE]

which is equivalent to

[TABLE]

Obviously,

[TABLE]

where $W=2mz+2w+1$ , $z=\lceil\frac{n}{n^{\prime}}\rceil-1$ , $w=n\%n^{\prime}-1$ if $0<n\%n^{\prime}<m$ and $w=m-1$ otherwise, i.e.,

[TABLE]

Therefore, (84) holds if (61) is satisfied.

Case 3: When $m\leq i^{\prime}<j^{\prime}<2m$ or $2m\leq i^{\prime}<j^{\prime}<3m$ , similar to that of Case 2, we also have that

[TABLE]

for all $a,b=0,1$ , which holds from a similar analysis as in Case 2.

Case 4: When $0\leq i^{\prime}<m$ and $m\leq j^{\prime}<2m$ , if $j^{\prime}=i^{\prime}+m$ , then by (9) we have

[TABLE]

which is equivalent to

[TABLE]

i.e.,

[TABLE]

The above inequality always holds since

[TABLE]

Otherwise, similar to Case 2, we have that

[TABLE]

is equivalent to

[TABLE]

which holds according to a similar analysis as in Case 2.

Case 5: When $0\leq i^{\prime}<m$ and $2m\leq j^{\prime}<3m$ , if $j^{\prime}=i^{\prime}+2m$ , then by (9) we have

[TABLE]

which holds for a similar reason as in Case 4; Otherwise,

[TABLE]

which is equivalent to

[TABLE]

i.e.,

[TABLE]

for all $a,b=0,1$ , which holds due to a similar analysis as in Case 2.

Case 6: When $m\leq i^{\prime}<2m$ and $2m\leq j^{\prime}<3m$ , similar to that of Case 5, if $j^{\prime}=i^{\prime}+m$ , we have

[TABLE]

is equivalent to

[TABLE]

otherwise

[TABLE]

is equivalent to

[TABLE]

The above two inequalities always hold due to a similar reason as in Case 5.

Combining the above 6 cases, we finish the proof. ∎

Finally, we demonstrate to what extent Theorem 12 can reduce the field size $q$ of the $(n,k)$ MDS code $\mathcal{C}_{4}$ . For example, when $n^{\prime}=6$ , $m=2$ , and $n=24$ . According to Theorem 12, we can set

[TABLE]

for $i\in[0,24)$ in (60) and

[TABLE]

in (55) for $i^{\prime}=0,1$ over $\mathbf{F}_{17}$ with $3$ being the primitive element. Whereas, by Theorem 11, the existence of the MDS code $\mathcal{C}_{4}$ requires a finite field with size larger than $92$ .

V An $(n,k)$ MDS code $\mathcal{C}_{5}$ with the optimal update property and small sub-packetization over small finite fields

Note from (37) that the parity-check matrix of the MDS code $\mathcal{C}_{1}$ has a constraint, that is, block matrices $A_{t,i}$ should satisfy that $A_{t,j_{1}}A_{t,j_{2}}^{-1}$ is a scalar matrix over $\mathbf{F}_{q}$ for all $j_{1}\equiv j_{2}(\bmod\,n^{\prime})$ and $t\in[0,r)$ , which reduces the designing space for the parameters $\lambda_{i,0},\ldots,\lambda_{i,r-1}$ in (33) to guarantee the MDS property. In this section, we propose another explicit $(n,k)$ MDS code which has a similar structure as that of the MDS code $\mathcal{C}_{1}$ , but allows more flexible choices of $\lambda_{i,0},\ldots,\lambda_{i,r-1}$ , and thus can further reduce the field size.

Let $N=r^{n^{\prime}}$ and $n>n^{\prime}$ , where $n$ and $n^{\prime}$ are two positive integer. Construct an $(n,k)$ code $\mathcal{C}_{5}$ with longer code length given by (2) and (3), where $A_{i}$ , $i\in[0,n)$ satisfy

[TABLE]

with $\lambda_{i,t}\in\mathbf{F}_{q}\backslash\{0\}$ and $V_{i,t}$ being defined by (7) and (9) for $t\in[0,r)$ . The repair matrices and select matrices are respectively defined by

[TABLE]

and

[TABLE]

Theorem 13.

Every failed node of the code $\mathcal{C}_{5}$ can be regenerated by the repair matrices defined in (96) and (97) if $\lambda_{i,0},\lambda_{i,1},\cdots,\lambda_{i,r-1}$ are pairwise distinct for each $i\in[0,n)$ . Furthermore, the repair bandwidth for node $i$ ( $i\in[0,n)$ ) is

[TABLE]

Proof.

Firstly, for $i\in[0,n)$ , by (95), we have

[TABLE]

Then,

[TABLE]

Obviously, the rank is $N$ if $\lambda_{i,u}\neq\lambda_{i,v}$ for all $u,v\in[0,r)$ with $u\neq v$ .

Next, we prove that (5) holds. By means of (10) and (98), if $j\not\equiv i\bmod n^{\prime}$ , then we have

[TABLE]

Otherwise, we have

[TABLE]

Therefore, by (6) and (96), the repair bandwidth of node $i$ is

[TABLE]

where $\gamma^{*}=(n-1)\frac{N}{r}$ is the optimal value for repair bandwidth. ∎

Theorem 14.

The code $\mathcal{C}_{5}$ possesses the MDS property if

(i)

$\lambda_{i,u}\neq\lambda_{j,v}$ * for all $u,v\in[0,r)$ and $i,j\in[0,n)$ with $j\not\equiv i\bmod n^{\prime}$ ,*

(ii)

$\lambda_{i,u}\neq\lambda_{i+gn^{\prime},u}$ * for all $u\in[0,r)$ , $g\in[1,\lceil\frac{n}{n^{\prime}}\rceil)$ , $i\in[0,n^{\prime})$ with $i+gn^{\prime}<n$ .*

Proof.

The proof can be proceeded in the same fashion as that of Theorem 4. ∎

In the following, we give an assignment of the values $\lambda_{i,u}$ , $i\in[0,n)$ , $u\in[0,r)$ so that the requirements in Theorems 13 and 14 can be satisfied.

Theorem 15.

The requirements in Theorems 13 and 14 can be satisfied if $q$ is a prime power such that

[TABLE]

Proof.

If $0<n\%(rn^{\prime})<n^{\prime}$ , then $\lceil\frac{n}{rn^{\prime}}\rceil-1=\lfloor\frac{n}{rn^{\prime}}\rfloor$ and $n\%(rn^{\prime})=n\%n^{\prime}$ , let $\xi_{i^{\prime},v}^{(z)}$ , $z\in[0,\lfloor\frac{n}{rn^{\prime}}\rfloor)$ , $i^{\prime}\in[0,n^{\prime})$ , $v\in[0,r)$ , and $\xi_{i^{\prime},v}^{(\lfloor\frac{n}{rn^{\prime}}\rfloor)}$ , $i^{\prime}\in[0,n\%n^{\prime})$ , $v\in[0,r)$ be $rn^{\prime}\lfloor\frac{n}{rn^{\prime}}\rfloor+(n\%n^{\prime})r$ pairwise distinct nonzero elements in $\mathbf{F}_{q}$ ; Otherwise, let $\xi_{i^{\prime},v}^{(z)}$ , $z\in[0,\lceil\frac{n}{rn^{\prime}}\rceil)$ , $i^{\prime}\in[0,n^{\prime})$ , $v\in[0,r)$ be $rn^{\prime}\lceil\frac{n}{rn^{\prime}}\rceil$ pairwise distinct nonzero elements in $\mathbf{F}_{q}$ . Then for $i=zrn^{\prime}+un^{\prime}+i^{\prime}$ , $i^{\prime}\in[0,n^{\prime})$ , $u\in[0,r)$ , $z\in[0,\lceil\frac{n}{rn^{\prime}}\rceil)$ , if we set $\lambda_{i,t}=\xi_{i^{\prime},t+u}^{(z)}$ for $i\in[0,n)$ and $t\in[0,r)$ , where the subscript $t+u$ is computed modulo $r$ , it is easy to verify that the requirements in Theorems 13 and 14 can be satisfied. ∎

In the following, we give a concrete example of the MDS code $\mathcal{C}_{5}$ according to Theorem 15.

Example 2.

Let $r=2$ , $n^{\prime}=3$ , and $n=12$ , then the parity-check matrix of the $(12,10)$ MDS code $\mathcal{C}_{5}$ over $\mathbf{F}_{13}$ is defined through

[TABLE]

where $c=2$ .

Similar to the MDS code $\mathcal{C}_{1}$ , we have the following result.

Theorem 16.

The MDS code $\mathcal{C}_{5}$ has the optimal update property.

VI Comparisons

In this section, we give comparisons of some key parameters among the proposed MDS codes and some existing notable MDS codes.

Table II compares the details of these codes, while Tables III-V compare the new MDS code $\mathcal{C}_{4}$ and the RTGE code 1 in terms of the sub-packetization level, the field size, and the repair bandwidth for $r=2,3$ and $4$ , respectively. From these tables, we see that the proposed MDS codes have the following advantages:

•

The new MDS codes $\mathcal{C}_{1}$ , $\mathcal{C}_{2}$ , $\mathcal{C}_{3}$ , and $\mathcal{C}_{5}$ can support any number of parity nodes while the shortened duplication-zigzag code222Note that the code length of the duplication-zigzag code in [3] is in the form of $uk^{\prime}+2$ with $uk^{\prime}\gg 2$ , in order to do a fair comparison under the same code length, we delete two nodes of the duplication-zigzag code in [3] and term the resultant code as shortened duplication-zigzag code. in [3] can only support two parity nodes.

•

The new MDS codes $\mathcal{C}_{1}$ and $\mathcal{C}_{5}$ have the optimal update property.

•

The new $(n=sn^{\prime},k)$ MDS codes derived in this paper indeed have a small sub-packetization level $N$ . Specifically, $N=r^{n^{\prime}}$ for the codes $\mathcal{C}_{1}$ and $\mathcal{C}_{5}$ , $N=r^{\frac{n^{\prime}}{r+1}}$ for the code $\mathcal{C}_{4}$ , and $N=r^{n^{\prime}-1}$ for the codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ . Note that $n^{\prime}$ can be fixed as a constant. Consequently, for each new MDS code, the sub-packetization level can be a constant, which is independent of code length $n$ .

•

Compared with the RTGE code 1 in [23], when $n^{\prime}=r\tau$ , the new explicit MDS codes $\mathcal{C}_{1}$ , $\mathcal{C}_{2}$ , $\mathcal{C}_{3}$ , and $\mathcal{C}_{5}$ are built on much smaller finite fields, but have larger sub-packetization levels. Besides, all the proposed MDS codes have the same repair bandwidth as the RTGE code 1 in [23] under the same parameters $n$ and $k$ .

•

Particularly, the new $(n,k)$ MDS code $\mathcal{C}_{4}$ has not only a smaller sub-packetization level, but also a much smaller finite field when compared to the RTGE code 1.

Nevertheless, the code $\mathcal{C}_{4}$ is explicit only for $r=2$ , which requires a finite field with size $q>\frac{2n^{\prime}}{3}\lceil\frac{n}{n^{\prime}}\rceil$ . For $r>2$ , further investigation is needed to find the explicit construction.

•

In contrast to RTGE code 2 in [23], which has sub-packetization growing logarithmically with the code length $n$ , the new codes have smaller sub-packetizations. For example, the sub-packetization level of the MDS code $\mathcal{C}_{5}$ is around $\frac{1}{\log n}$ times that of the RTGE code 2 in [23] when $n^{\prime}=r\tau$ .

•

The RTGE codes 1 and 2 in [23] show that it is possible to trade repair bandwidth for sub-packetization, while the proposed codes $\mathcal{C}_{1}$ , $\mathcal{C}_{2}$ , $\mathcal{C}_{3}$ , and $\mathcal{C}_{5}$ further show that it is possible to trade sub-packetization for field size base on the RTGE code 1, as these new codes are explicit and are over small finite fields.

In addition to the above advantages, the new codes $\mathcal{C}_{1}$ - $\mathcal{C}_{5}$ have a defect that they do not possess the load balancing property as some of the helper nodes contribute a higher amount of data during the node repair process. Whereas, the RTGE code 2 in [23] is load balanced, where all the contacted nodes provide (approximately) the same amount of information during the repair process.

VII Conclusion

In this paper, we provided a powerful transformation that can greatly reduce the sub-packetization level $N$ of the original codes with respect to the same code length $n$ . Four applications of the transformation were demonstrated, three of which are explicit and over a small finite field. In addition, another explicit MDS code construction over a small finite field and with small sub-packetization level, small repair bandwidth as well as the optimal update property was presented. The comparisons show that the obtained MDS codes outperform existing MDS codes in terms of the field size and/or the sub-packetization level. Extending our transformation and constructions to the case of $d<n-1$ or multiple node failures are part of our ongoing work.

Appendix A Proof of Theorem 1

Before proving Theorem 1, let us introduce some necessary definitions and results on determinants.

Definition 1 ([25]).

A $k$ -rowed minor of an $n$ -rowed determinant $D=det(a_{i,j})_{i\in[0,n),j\in[0,n)}$ is any $k$ -rowed determinant obtained when $n-k$ rows and $n-k$ columns are deleted from $D$ . The $k$ -rowed minor obtained from $D$ by retaining only the elements belonging to rows $r_{0},\ldots,r_{k-1}$ and columns $s_{0},\ldots,s_{k-1}$ will be denoted by

[TABLE]

The cofactor $\widetilde{D}(r_{0},\ldots,r_{k-1}|s_{0},\ldots,s_{k-1})$ of the minor $D(r_{0},\ldots,r_{k-1}|s_{0},\ldots,s_{k-1})$ in a determinant $D$ is defined as

[TABLE]

where $r_{k},\ldots,r_{n-1}$ are the $n-k$ numbers among $0,\ldots,n-1$ other than $r_{0},\ldots,r_{k-1}$ and $s_{k},\ldots,s_{n-1}$ are the $n-k$ numbers among $0,\ldots,n-1$ other than $s_{0},\ldots,s_{k-1}$ .

Lemma 6 (Laplace’s expansion theorem [25]).

Let $D$ be an $n$ -rowed determinant, and let $r_{0},\ldots,r_{k-1}$ be integers such that $0\leq k<n-1$ and $0\leq r_{0}<\ldots<r_{k-1}<n$ . Then

[TABLE]

Proposition 1.

Let $u\geq 2$ and let

[TABLE]

be a block matrix of order $uN$ over a certain finite field $\mathbf{F}_{q}$ , where $y_{i,j}$ is an indeterminate in $\mathbf{F}_{q}$ and $B_{i,j}$ is a full rank matrix of order $N$ for $i,j\in[0,u)$ . Then $\det(B)$ is a homogeneous polynomial of degree $uN$ which includes the term

[TABLE]

Proof.

Clearly, $\det(B)$ is a $uN$ -rowed determinant, the expansion of which includes $(uN)!$ terms, where each term is a monomial of degree $uN$ . Therefore, $\det(B)$ is a homogeneous polynomial of degree $uN$ . In the following, we prove that $\det(B)$ includes the term in (99) by induction.

Let $D=\det(B)$ , when $u=2$ , then by Definition 1 and Lemma 6, we can get (100) in the next page,

which implies that $D$ includes the term in (99).

Assume that the induction hypothesis holds, i.e., $D$ includes the term in (99) for $u=v\geq 2$ . Then, when $u=v+1$ , similarly, we can obtain (101) in the next page.

Note from Definition 1 that $\widetilde{D}(vN,\ldots,(v+1)N-1|vN,\ldots,(v+1)N-1)$ is a $vN$ -rowed determinant, which includes the term

[TABLE]

by the induction hypothesis. Hence, $D$ includes the term

[TABLE]

Based on the above analysis, we proved that $\det(B)$ includes the term in (99) for any $u\geq 2$ . ∎

Proof of Theorem 1: By (11), the parity-check matrix of the new $(n,k)$ code is

[TABLE]

with the $j$ -th block column being

[TABLE]

Then the new code is MDS if and only if any $r\times r$ sub-block matrix of $A$ is nonsingular.

For any $J=\{j_{0},j_{1},\cdots,j_{r-1}\}\subset[0,n)$ , let $P_{J}$ be the $r\times r$ sub-block matrix of $A$ formed by the $r$ block columns indicated by $J$ , i.e.,

[TABLE]

which is nonsingular if $\det(P_{J})$ is nonzero. Define $P=\prod\limits_{J\subset[0,n),|J|=r}P_{J}$ , then $\det(P)=\prod\limits_{J\subset[0,n),|J|=r}\det(P_{J})$ . Thus, it suffices to prove that there is an assignment to the variables $x_{i,j}$ , $i\in[0,r)$ , $j\in[0,n)$ that does not evaluate $\det(P)$ to zero.

By Proposition 1, $\det(P_{J})$ is a homogeneous polynomial of degree $rN$ which includes the term

[TABLE]

Then, $\det(P)$ is a homogeneous polynomial of degree $rN{n\choose r}$ , where each indeterminate $x_{i,j}$ has degree at most $N{n-1\choose r-1}$ . Therefore, by Lemma 2, if $q>N{n-1\choose r-1}+1$ , then there are $x_{0,0},\ldots,x_{0,n-1},\ldots,x_{r-1,0},\ldots,x_{r-1,n-1}\in\mathbf{F}_{q}\backslash\{0\}$ that does not evaluate $\det(P)$ to zero. This finishes the proof. $\square$

Appendix B Proof of Theorem 10

The new storage code $\mathcal{C}^{\prime}_{4}$ has the optimal repair bandwidth if and only if (4) and (5) hold.

(i) Firstly, by (54), (55), and (59), we determine the necessary and sufficient conditions for (4) according to the following two cases.

Case 1: For any $i^{\prime}\in[0,rm)$ , let $u=\lfloor\frac{i^{\prime}}{m}\rfloor$ , then we have

[TABLE]

which is of full rank if and only if (112) in the next page holds,

i.e., $\lambda_{i^{\prime},0},\lambda_{i^{\prime},1},\cdots,\lambda_{i^{\prime},r-1}$ are pairwise distinct.

Case 2: For $i^{\prime}\in[rm,(r+1)m)$ ,

[TABLE]

which holds if and only if $\lambda_{i^{\prime},0},\lambda_{i^{\prime},1},\cdots,\lambda_{i^{\prime},r-1}$ are pairwise distinct.

(ii) Secondly, by (9), (54), (58), and (59), we establish the necessary and sufficient conditions for (5) according to the following four cases.

Case 1: For $t\in[0,r)$ and $i^{\prime},j^{\prime}\in[0,rm)$ with $i^{\prime}\neq j^{\prime}$ , let $u=\lfloor\frac{i^{\prime}}{m}\rfloor$ and $v=\lfloor\frac{j^{\prime}}{m}\rfloor$ . If $j^{\prime}\not\equiv i^{\prime}$ mod $m$ , then we have

[TABLE]

Otherwise, $u\neq v$ , thus we have

[TABLE]

Case 2: For $t\in[0,r)$ , $i^{\prime}\in[rm,(r+1)m)$ and $j^{\prime}\in[0,rm)$ , let $u=\lfloor\frac{j^{\prime}}{m}\rfloor$ . If $j^{\prime}\not\equiv i^{\prime}$ mod $m$ , we have

[TABLE]

Otherwise,

[TABLE]

Case 3: For $t\in[0,r)$ , $i^{\prime}\in[0,rm)$ and $j^{\prime}\in[rm,(r+1)m)$ , we easily have

[TABLE]

Case 4: For $i^{\prime},j^{\prime}\in[rm,(r+1)m)$ and $i^{\prime}\neq j^{\prime}$ , we have

[TABLE]

This finishes the proof after combining (i) and (ii).

Acknowledgment

The authors would like to thank the Associate Editor Dr. Parastoo Sadeghi and the three anonymous reviewers for their valuable suggestions and comments, which have greatly improved the presentation and quality of this paper. Jie Li would like to thank Prof. Alexander Barg and Prof. Itzhak Tamo for helpful discussions during his visit at the University of Maryland, College Park.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A.G. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inform. Theory, vol. 56, no. 9, pp. 4539-4551, Sep. 2010.
2[2] K.V. Rashmi, N.B. Shah, and P.V. Kumar, “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction,” IEEE Trans. Inform. Theory, vol. 57, no. 8, pp. 5227-5239, Aug. 2011.
3[3] T. Tamo, Z. Wang, and J. Bruck, “Zigzag codes: MDS array codes with optimal rebuilding,” IEEE Trans. Inform. Theory, vol. 59, no. 3, pp. 1597-1616, Mar. 2013.
4[4] D.S. Papailiopoulos, A.G. Dimakis, and V.R. Cadambe, “Repair optimal erasure codes through Hadamard designs,” IEEE Trans. Inform. Theory, vol. 59, no. 5, pp. 3021-3037, May 2013.
5[5] J. Li, X. Tang, and C. Tian, “A generic transformation to enable optimal repair in MDS codes for distributed storage systems”, IEEE Trans. Inform. Theory, vol. 64, no. 9, pp. 6257-6267, Sept. 2018.
6[6] X. Tang, B. Yang, J. Li, and H.D.L. Hollmann, “A new repair strategy for the Hadamard minimum storage regenerating codes for distributed storage systems,” IEEE Trans. Inform. Theory, vol. 61, no. 10, pp. 5271-5279, Oct. 2015.
7[7] J. Li and X. Tang, “Optimal exact repair strategy for the parity nodes of the ( k + 2 , k ) 𝑘 2 𝑘 (k+2,k) Zigzag code,” IEEE Trans. Inform. Theory, vol. 62, no. 9, pp. 4848-4856, Sep. 2016.
8[8] J. Li, X. Tang, and C. Tian, “A generic transformation for optimal repair bandwidth and rebuilding access in MDS codes”, in Proc. IEEE Int. Symp. Inform. Theory, Aachen, Germany, Jun. 2017, pp. 1623-1627.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Systematic Construction of MDS Codes

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A (n,k)(n,k)(n,k)* MDS codes*

II-B The MDS property

Lemma 1** ([12]).**

II-C Repair

II-D Partition of basis {e0,⋯ ,eN−1}\{e_{0},\cdots,e_{N-1}\}{e0​,⋯,eN−1​}

III A generic transformation

Remark 1**.**

Lemma 2** (Theorem 1.2 of [24]).**

Theorem 1**.**

Proof.

Remark 2**.**

Theorem 2**.**

Proof.

Remark 3**.**

IV MDS code constructions by directly applying the generic transformation

IV-A An (n,k)(n,k)(n,k) MDS code C1\mathcal{C}_{1}C1​ by applying the generic transformation to the YB code 1 in [12]

Theorem 3**.**

Theorem 4**.**

Proof.

Example 1**.**

Theorem 5**.**

Proof.

IV-B Two (n,k)(n,k)(n,k) MDS codes C2\mathcal{C}_{2}C2​ and C3\mathcal{C}_{3}C3​ by applying the generic transformation respectively to the YB code 2 in [12] and the improved YB code 2 in [15]

Theorem 6**.**

Theorem 7**.**

Lemma 3** (Lemma 2, [15]).**

Lemma 4** (Lemma 3, [15]).**

Lemma 5** (Lemma 4, [15]).**

Theorem 8**.**

Proof.

IV-C An (n,k)(n,k)(n,k) MDS code C4\mathcal{C}_{4}C4​ obtained by applying the generic transformation to a newly constructed MDS code C4′\mathcal{C}^{\prime}_{4}C4′​

Theorem 9**.**

Proof.

Theorem 10**.**

Proof.

Theorem 11**.**

Theorem 12**.**

Proof.

V An (n,k)(n,k)(n,k) MDS code C5\mathcal{C}_{5}C5​ with the optimal update property and small sub-packetization over small finite fields

Theorem 13**.**

Proof.

Theorem 14**.**

Proof.

Theorem 15**.**

Proof.

Example 2**.**

Theorem 16**.**

VI Comparisons

VII Conclusion

Appendix A Proof of Theorem 1

Definition 1** ([25]).**

Lemma 6** (Laplace’s expansion theorem [25]).**

Proposition 1**.**

Proof.

Appendix B Proof of Theorem 10

Acknowledgment

II-A $(n,k)$ * MDS codes*

Lemma 1 ([12]).

II-D Partition of basis $\{e_{0},\cdots,e_{N-1}\}$

Remark 1.

Lemma 2 (Theorem 1.2 of [24]).

Theorem 1.

Remark 2.

Theorem 2.

Remark 3.

IV-A An $(n,k)$ MDS code $\mathcal{C}_{1}$ by applying the generic transformation to the YB code 1 in [12]

Theorem 3.

Theorem 4.

Example 1.

Theorem 5.

IV-B Two $(n,k)$ MDS codes $\mathcal{C}_{2}$ and $\mathcal{C}_{3}$ by applying the generic transformation respectively to the YB code 2 in [12] and the improved YB code 2 in [15]

Theorem 6.

Theorem 7.

Lemma 3 (Lemma 2, [15]).

Lemma 4 (Lemma 3, [15]).

Lemma 5 (Lemma 4, [15]).

Theorem 8.

IV-C An $(n,k)$ MDS code $\mathcal{C}_{4}$ obtained by applying the generic transformation to a newly constructed MDS code $\mathcal{C}^{\prime}_{4}$

Theorem 9.

Theorem 10.

Theorem 11.

Theorem 12.

V An $(n,k)$ MDS code $\mathcal{C}_{5}$ with the optimal update property and small sub-packetization over small finite fields

Theorem 13.

Theorem 14.

Theorem 15.

Example 2.

Theorem 16.

Definition 1 ([25]).

Lemma 6 (Laplace’s expansion theorem [25]).

Proposition 1.