This paper introduces a new linear coding scheme for distributed storage that optimally balances repair bandwidth and storage overhead, applicable to general system parameters, and features a helper-independent repair process.
Contribution
It proposes a unified, linear coding construction for exact-repair regenerating codes covering the entire trade-off, with low field size and helper-independent repair mechanism.
Findings
01
Achieves the entire trade-off including MBR and MSR points
02
Field size is only Θ(n), independent of parity nodes
03
Sub-packetization level is at most (d-k+1)^k
Abstract
A novel coding scheme for exact repair-regenerating codes is presented in this paper. The codes proposed in this work can trade between the repair bandwidth of nodes (number of downloaded symbols from each surviving node in a repair process) and the required storage overhead of the system. These codes work for general system parameters (n,k,d), which are the total number of nodes, the number of nodes suffice for data recovery, and the number of helper nodes in a repair process, respectively. The proposed construction offers a unified scheme to develop exact-repair regenerating codes for the entire trade-off, including the MBR and MSR points. We conjecture that the new storage-vs.-bandwidth trade-off achieved by the proposed codes is optimum. Some other key features of this code include: the construction is linear; the required field size is only Θ(n); and the code parameters…
Tables6
Table 1. TABLE I: Comparison between code constructions proposed for the MSR point.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
Cascade Codes For Distributed Storage Systems
Mehran Elyasi and Soheil Mohajer
M. Elyasi and
S. Mohajer are with the Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, MN 55455, USA, (email: {melyasi, soheil}@umn.edu).This work was supported in part by the National Science Foundation under Grant CCF-1617884.This paper is presented in part at the IEEE International Symposium on Information Theory (ISIT), 2018 [1].
Abstract
A novel coding scheme for exact repair-regenerating codes is presented in this paper.
The codes proposed in this work can trade between the repair bandwidth of nodes (number of downloaded symbols from each surviving node in a repair process) and the required storage overhead of the system. These codes work for general system parameters (n,k,d), which are the total number of nodes, the number of nodes suffice for data recovery, and the number of helper nodes in a repair process, respectively. The proposed construction offers a unified scheme to develop exact-repair regenerating codes for the entire trade-off, including the MBR and MSR points. We conjecture that the new storage-vs.-bandwidth trade-off achieved by the proposed codes is optimum. Some other key features of this code include: the construction is linear; the required field size is only Θ(n); and the code parameters and in particular sub-packetization level is at most (d−k+1)k; which is independent of the number of the parity nodes. Moreover, the proposed repair mechanism is helper-independent, that is the data sent from each helper only depends on the identity of the helper and failed nodes, but independent of the identity of other helper nodes participating in the repair process.
I introduction
The dynamic, large and disparate volume of data garnered from social media, Internet-driven technologies, financial records, and clinical research has arisen an increasing demand for reliable and scalable storage technologies.
Distributed storage systems are widely being used in modern data centers, such as Google File System [2], Facebook Distributed File System [3], Microsoft Azure [4] and also peer-to-peer storage settings, such as DHash++ [5], OceanStore [6] and Total Recall [7].
In distributed storage systems individual storage nodes are unreliable due to various hardware and software failures. Hence, redundancy is introduced to improve the system’s reliability in the presence of node failures. The simplest form of redundancy is the replication of the data in multiple storage nodes. Even though it is the most common form of redundancy, replication is very inefficient in terms of the offered reliability gain per cost of the extra storage units required to store the redundancy. In this context, coding techniques have provably achieved orders of magnitude more reliability for the same redundancy compared to replication.
Besides the reliability offered by storing the redundant data, in order to be durable, it is necessary for a storage system to repair the failed nodes. The repair process consists of downloading (part of) the content of a number of surviving nodes to reconstruct the missing content of the failed nodes. The conventional erasure codes suffer from high repair-bandwidth, the total size of data to be downloaded for the repair of each failed node. Regenerating code is a class of erasure codes which have gained popularity in this context, due to their low repair-bandwidth, while providing the same level of fault tolerance as erasure codes.
I-A Problem Formulation and System Model
In an (n,k,d) regenerating code [8], a file comprised of F data symbols, each from a finite field Fq, is encoded into n pieces, and each piece will be stored in one storage node of capacity α symbols.
The stored data in the nodes should maintain two main properties:
Data Recovery: By accessing any set of k nodes, the data collector must be able to recover the original stored file.
2. 2.
Node Repair: In the event of node failure, the content of the failed node can be regenerated by connecting to any subset of H nodes of size ∣H∣=d, and downloading β symbols from each of the d nodes. The set H is called the set of helper nodes.
In [8] it is shown that there is a trade-off between the per-node storage capacity α and the per-node repair-bandwidth β in a storage system that can guarantee the above main properties. While it is desired to minimize both α and β, one can be reduced only at the cost of increasing the other.
There are two types of node repairs: (i) functional-repair, where a failed node will be replaced by a new node such that the resulting system continues to satisfy the data recovery and node repair properties. An alternative to function repair is (ii) exact-repair, under which the replaced node stores precisely the same content as the failed node. Hence, exact-repair is a more demanding criterion, and it is expected to require more repair bandwidth in comparison to functional repair, for a given storage size. However, from the practical stand, the exact repair is preferred, since it does not need the extra overhead of updating the meta-data (the relationship between the contents) and the system configuration.
I-B An Overview of the Related Works
The regenerating codes were introduced in the seminal work of Dimakis et al. [8], wherein (n,k,d) distributed storage systems were studied using information flow graphs. Moreover, using the cut-set bound, it was shown that
the per-node storage capacity α, the per-node repair bandwidth β, and the file size F should satisfy
[TABLE]
for a storage system that maintains data recovery and node repair properties. This bound implies a trade-off between α and β for a given F. This trade-off was shown to be achievable for the functional repair using random codes introduced in [9, 10].
An important follow-up question was whether the same trade-off is achievable with the exact-repair property. First, in [11] it was shown that exact-repair regenerating codes can be constructed for the extreme points of the trade-off, namely, the minimum bandwidth regeneration (MBR) referring to the minimum β satisfying (1), and the minimum storage regeneration111The first MSR code construction in [11] only holds for some range of system parameters. (MSR), referring to the minimum α for which (1) can be satisfied for a given F. Later, in [12] it was shown that some of interior (between the two extreme) points of the trade-off are not achievable under the exact repair criterion. While the proof of [12] did not rule out the possibility of approaching the non-achievable trade-off points with an arbitrary small gap,
the next question was whether there is a non-vanishing gap between the trade-off of exact-repair and functional-repair codes. This question was first answered in [13], where using a computer-aided approach of Yeung for information-theoretic inequalities [14], Tian completely characterized the trade-off for an (n,k,d)=(4,3,3) system. Note that (4,3,3) is the smallest system parameter for which there is a non-vanishing gap between the functional and the exact repair trade-off.
Thereafter, the attention of the data storage community has shifted to characterizing the optimum storage-bandwidth trade-off for the exact-repair regenerating codes. A trade-off characterization consists of two-fold: (i) designing code constructions satisfying data recovery and exact node repair properties, to achieve pairs of (α,β), and (ii) proving information-theoretic arguments that provide lower bounds for the achievable pairs of (α,β). The focus of this paper is on the code construction part, and hence, we provide a brief review of the existing code constructions in the literature. To this end, we divide the existing codes into three main categories based on the achievable trade-offs, as follows.
1. The MBR point: This point was fully solved for general (n,k,d) in [11], where it was shown that the functional-repair trade-off is also achievable under the exact-repair criterion.
2. The MSR point: The trade-off offered by (1) for the MSR point is achievable by exact repair regenerating codes, and most of the existing code constructions are dedicated to this point. In [15, 16], it was shown that the exact-repair MSR code (for both low rate with k/n≤1/2 and high rate regimes where k/n>1/2) is achievable in the asymptotic sense, that is when the file size is growing unboundedly. However, the proof was existential and no explicit code construction was provided.
One of the earliest work in this area was [17], where a computer search was carried out to find an (n,k,d)=(5,3,4) MSR code. In [11] a code construction for parameters satisfying n−1≥d≥2k−2 was presented. For the codes with n−1>d>2k−2 the code construction includes two steps: first a code is developed with d′=2k′−2, and then converted to a code with desired (k,d) parameters. Later, in [18] the code construction was unified for all parameters n−1≥d≥2k−2.
The explicit code construction for the codes with parameters d<2k−2 and d≤n−1 was an open problem and several papers published to improve the state-of-the-art. The code constructions in [19, 20, 21, 22, 23] were limited to the repair of only systematic nodes. Another category of code constructions was dedicated to codes with a limited number of parity nodes. In particular, explicit MDS storage codes with only two parities (n=k+2) are constructed based on Hadamard matrices in [24] and permutation matrices in [25, 26]. Both constructions offer an optimum repair-bandwidth for the repair of any single (systematic or parity) node failure.
Sub-packetization level, referring to the unnormalized value of α in terms of the number of symbols, is a practically important parameter of any code construction. While it is preferred to minimize the sub-packetization level, it cannot be lower than a certain lower bound provided in [27].
A class of MSR codes was introduced in [28] which requires only polynomial sub-packetization in k, but a very large field size. Nevertheless, the proposed construction of [28] was not fully explicit, and it was limited to parameters satisfying n=d−1. The latter restriction was later relaxed in [29], where the same result was shown for an arbitrary number of nodes, n. However, the code construction in [29] still requires a large field size.
Several MSR code constructions for arbitrary parameters (n,k,d) were recently proposed [30, 31, 32, 33]. They were all optimum in terms of the storage vs. repair-bandwidth trade-off, and all achieve the optimum sub-packetization, i.e, matching the bound introduced in [27]. The codes proposed in [34] offer a dynamic repair, where the number of helpers is flexible to be varied between k and n−1.
Our proposed codes in this work cover the entire trade-off, including the MSR point. For the resulting MSR codes from proposed construction, the code parameters and especially the sub-packetization level do not depend on the total number of nodes n. Also, another advantage of this construction is its flexibility of system expansion by adding new parity nodes. In contrast to other existing codes in the literature, where the entire code needs to be redesigned in order to add new parity nodes, the proposed construction allows adding new parity nodes to the system, without changing the content of the other nodes. A comparison between the MSR code obtained from this construction and the existing codes, in terms of required field size and sub-packetization level, is presented in Table I.
3. Interior points: The construction for the interior points (trade-off points except for MBR and MSR) was restricted to the specific system parameters. In [35] a code construction for (n=d+1,k=d,d) was presented. The trade-off achieved by this construction was shown to be optimum under the assumption of the linearity of the code [36, 37, 38]. However, it wasn’t clear that if the same trade-off is achievable for n>d+1. Most of the follow-up efforts to increase the number of parity nodes resulted in compromising the system capacity to construct a code for larger values of the n, and hence their trade-off was diverging from the lower bound of [36, 37, 38], and n increases. The first n-independent achievable trade-off for interior points was provided in [39], where it was shown that the first corner point on the trade-off next to the MSR point can be achieved for any (n,k=d,d) system. However, the proof of [39] was just an existence proof, where a random ensemble of codes were introduced, and it was shown that for any n and large enough field size, there exists at least one code in the ensemble that satisfies both data recovery and node repair properties.
In [40] the above-mentioned restriction for n was uplifted, where an explicit code construction for the entire trade-off of an (n,k=d,d) storage system was introduced. The proposed determinant codes are optimal subject to the linearity of the code and achieve the lower bound in [36, 37, 38], regardless of the total number of nodes. However, the repair process of the determinant codes in [40] requires heavy computation, and more importantly, the repair data sent from a helper node to a failed node depends on the identity of all the helpers participating in the repair process. Later in [41] this issue was resolved by introducing a new repair mechanism for determinant codes.
The next set of works focused on breaking the last constraint, i.e., k=d. An explicit code construction was introduced in [35] for an (n,k≤d,d) system. The resulting trade-off was improved by the code construction of [42].
In [43] a class of improved layered codes was introduced. However, it turned out that the trade-off achieved by [43] is optimum only for the corner point next to the MBR, and only for an (n=d+1,k,d) system. These limitations were slightly relaxed in [44], where the constraint of n=d+1 was lifted. However, the construction in [44] was dedicated to the trade-off point next to the MBR, implying a low repair bandwidth. Later in [45], this construction was extended for the entire trade-off but only for an (n,k=d−1,d) system. In this paper, we propose a code construction for general (n,k,d) parameters for the entire trade-off. We conjecture that the resulting trade-off is optimum.
I-C Notation and Terminologies
We will use lowercase letters to denote scalars (e.g., integers k and d) or data file symbols (e.g. v and w that take value from a finite field). We use bold capital letters to denote matrices. For positive integers a and b, we denote set {a,a+1,…,b−1,b} by [a:b], and set {1,2,…,b} by [b]. Also, for a>b we have [a:b]=∅. Calligraphic letters (e.g., I and J) are used to denote a set of integer numbers. Moreover, x∈I indicates that the scalar x belongs to the set I. The largest entry of a set I is denoted by maxI, and the maximum of an empty set is defined as max∅=−∞, for consistency. In this work, ∣I∣ denotes the size of the set I. For two sets I and J with ∣I∣=∣J∣, we write I≺J to indicate the lexicographical order between I and J, e.g., {1,2,3}≺{1,2,4}≺{1,2,5}≺{1,3,4}≺{1,3,5}.
For an integer x and a set I, we define
[TABLE]
For a matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P, we may label its rows and columns by integers (e.g. i and j) or by sets (e.g. I and J), which will be specified accordingly. Then \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Pi,j (or \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,PI,J) refers to an entry of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P at the row indexed by i (or I) and the column labeled by j (or J). Moreover, we denote the i-th row of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Pi,:. We also use \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P:,j to refer to the j-th column of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P. Lastly, we may use \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P[X,Y] to refer to a submatrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P obtained by a family (with family elements being integer numbers or sets) of rows X and a family of columns Y. Accordingly, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P[X,:] is a submatrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P formed by stacking all rows with labels belonging to X.
Note that k and d (with k≤d) are the main system parameters throughout the paper. For a matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P with d rows, we define \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P[{1,2,⋯,k},:] and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P[{k+1,⋯,d},:] to be sub-matrices of P obtained from stacking the top k and the bottom (d−k) rows of a matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P, respectively.
Throughout the analysis, we frequently need to concatenate several matrices, that is, merging a number of matrices with the same number of rows side-by-side, to form a fat matrix with the same number of rows.
Finally, the binomial coefficient is defined as (mℓ)=m!(ℓ−m)!ℓ! and we set it to zero for m<0 and m>ℓ, for the sake of consistency.
I-D Paper Organization
The rest of this paper is organized as follows. We first present the main result of this work, which is the trade-off achieved by the proposed codes, in Section II. Appendix A includes the proof to show that the parameters of this code can achieve MBR, MSR, and another point on the cut-set bound. Then in Section III, we review the (signed) determinant codes [40, 41], their construction and main properties. The proof of node repairability for singed determinant codes is given in Appendix B for the sake of completeness. The core idea of our code construction, which is the multiplication of a fixed encoder matrix to a cascade message matrix, is presented in Section IV. Appendix C discusses the conversion of the proposed code with an arbitrary encoder matrix to a (semi)-systematic code. Section V, explains the details of cascading message matrices and includes a running example to facilitate understanding the notation and the concept of injection. The main properties of node exact repair and data recovery are proved in Sections VI and VII, respectively.
The parameters of the proposed code construction are evaluated in Section VIII. An explicit evaluation of the parameters requires solving an implicit recursive equation, which is performed using the Z-transform, as discussed in Appendix D.
Finally, the paper is concluded in Section IX, with a comparison between the proposed code and the product-matrix code, a discussion on our conjecture regarding the optimality of the proposed codes, and a number of related open questions for future works.
II Main Result
The main contribution of this paper is a novel construction for exact-repair regenerating codes, with arbitrary parameters (n,k,d). The following theorem characterizes the achievable storage vs. repair-bandwidth trade-off of the proposed code construction.
Theorem 1**.**
For a distributed storage system with parameters (n,k,d) satisfying k≤d<n, the triple (α,β,F) defined as
[TABLE]
can be achieved by the cascade codes proposed in this paper for μ∈{1,2,…,k}.
The trade-off achieved by this construction is shown in Fig. 1, and is compared against that of other existing codes.
Note that for a given system parameters (n,k,d), a total of k triples of (α,β,F) can be obtained from (3), by varying the parameter μ, which we refer to as the mode of the code. Therefore, while parameters (n,k,d) are the system parameters, we append the parameter μ to this notation as (n,k,d;μ), to refer to the specific point on the trade-off corresponding to μ.
The following corollary shows that MBR and MSR points are subsumed as special cases of the proposed trade-off. The proof of the following corollaries are straightforward algebraic manipulations and given in Appendix A.
Corollary 1**.**
The cascade code with mode μ=1 is an MBR code with parameters
[TABLE]
Similarly, the cascade code with mode μ=k is an MSR code with parameters
[TABLE]
Corollary 2**.**
The proposed code at mode μ=k−1 achieves the cut-set bound, and hence it is optimum.
Fig. 2 compares the scalability aspect of the code construction of this paper with an existing construction. The term scalability refers to the property that the number of nodes in a distributed storage system can be increased (for a sufficiently large field size) without compromising the system performance, and its overall capacity.
The rest of this paper is dedicated to a comprehensive proof of Theorem 1. After reviewing the (signed) determinant codes [40, 41] in Section III, we introduce the cascade codes and present their construction in Sections IV and V. Then, we prove that the proposed codes maintain the exact-repair property (Proposition 4) and data recovery property (Proposition 5), in Sections VI and VII, respectively. Finally, in Section VIII we show that the parameters of the proposed code match those claimed in Theorem 1.
III A Review of (n,k=d,d;m) Signed Determinant Codes
The code construction presented in this paper uses signed determinant codes as the main building blocks. The family of determinant codes, first introduced in [40, 41]
is a class of codes that achieve the optimum (linear) storage-bandwidth trade-off of the regenerating codes [36, 37, 38], when the number nodes participating in data recovery equals the number of helpers contributing in a repair process, i.e., k=d. They are linear codes, which can be constructed by multiplying an encoder matrix by a message matrix.
The modification here (that converts a determinant code to a signed determinant code) is due to an arbitrary assignment of (+/−) signs to the rows of the message matrix, which affect all the entries in the corresponding row. This modification will be applied by a signature vector defined in Definition 2. As we will see in this section, the above modification preserves all properties of determinant codes, while it is helpful towards our next step, which is the construction of (n,k,d;m) codes.
In the following, we review the construction of signed determinant codes and their properties. First, note that the mode for the d=k signed determinant codes is defined in the same way that is defined for k<d cascade code defined in Section II. This means that the family of signed determinant codes for an (n,k=d,d) system consists of k distinct codes, enumerated by a parameter m∈{1,2,…,k}, which is called mode of the code.222The reason that we use different letters m and μ for the mode of the cascade codes and determinant codes is that in the construction of (n,k,d;μ) cascade codes of this paper, several (n,k=d,d;m) signed determinant code with different values of m is used.
For any mode m∈[k], the parameters of the determinant code corresponding to the m-th corner point on the trade-off are given by
[TABLE]
Here m=1 corresponds to MBR code, while m=k results in the parameters of an MSR code. We also define the operator mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D) which returns parameter m (the mode) of the determinant code \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D.
III-A Code Construction of (n,k=d,d;m) codes
A signed determinant code with parameters (n,k=d,d;m) is represented by a matrix Cn×αm whose i-th row includes the coded content of the i-th node. In general, Cn×αm is obtained by multiplying an encoder matrixΨn×d whose entries are from a finite field Fq and a message matrix\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Dd×αm. The encoder matrix Ψ is chosen such that any collection of d of its rows are linearly independent. Examples of such matrices include Vandermonde [46] and Cauchy matrices [47]. The message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D has d rows and αm=(md) columns, and its entries are filled with the symbols from the file to be stored in the storage system. To do this, we first split Fm=m(m+1d+1) raw file symbols into two groups, namely, V and W, and label them as
[TABLE]
Note that the symbols of V are indexed by a set X⊆[d] of size m and an element x∈X, implying ∣V∣=m(md). Similarly, the symbols in W are indexed by a pair (x,Y), where x can be any element of Y, except the largest element. Hence, there are ∣W∣=m(m+1d) symbols in set W. Note that
[TABLE]
In the definition of W symbols in (4), the symbol corresponding to x=maxY was excluded. In the following we define this symbol as the parity symbol.
Definition 1**.**
Let Y⊆[d],∣Y∣=m+1, we define a parity symbol wmaxY,Y, such that the parity equation
[TABLE]
is satisfied.333This parity equation is the same as original determinant codes introduces in [40, 41]. We refer to the symbols in the set {wy,Y:Y⊆[d],∣Y∣=m+1,y∈Y} as the w-group of Y.
Definition 2**.**
For a signed determinant codes, a vector σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D, called signature vector, is given as the input of the code construction.444In the original determinant codes introduced in [40, 41] σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D was fixed to σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(x)=0 for all x∈[d]. This vector is of length d with integer entries. The entry at position x∈[d] of this vector is denoted by σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(x). Using this signature vector a plus or minus sign will be assigned to each integer x∈[d], that is (−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(x).
Remark 1**.**
Any choice of the signature vector yields a valid exact regenerating code satisfying data recovery and exact repair properties. In this paper, whenever we generate an instance of a signed determinant code, we specify the signature vector of that code first. ⋄
To fill the message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D, we label its rows by integer numbers from [d] and its columns by subsets I⊆[d] with ∣I∣=m, according to the lexicographical order. Then the entry at row x and column I is given by
[TABLE]
Once the message matrix is formed, the coded content of node i is given by Ψi,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D.
For the sake of completeness, we define an (n,k=d,d;m=0) determinant code at mode m=0 to be a trivial code with (α=1,β=0,F=0), whose message matrix is a d×1 all-zero matrix.
In [41, Section IV], an illustrative example for a determinant code with parameters
(n,k,d;m)=(8,4,4;2) is given. The corresponding signature of this code is (0,0,0,0); however, each row of the message matrix can be multiplied by an arbitrary sign to change the signature vector of the code. This example also explains the main idea of the code construction for determinant codes, as well as data recovery and node repair properties of the code.
Additionally, some instances of message matrices of (n,d,d;m) determinant codes are provided in this paper. For instance, the matrix in (14) of Example 1, is an instance of the message matrix of determinant code for d=6,m=4 with an all-zero signature vector. In this example, there is a separator line between rows 4 and 5, and also some symbols are highlighted and marked by frames, which will be explained later. Regardless, if these details are ignored, the underlying matrix is an (n,d,d;m)=(n,6,6;4) message matrix. Also, two matrices are given in (20) and (22) in Example 6, which are message matrices for an (n,k,d)=(n,6,6) determinant code of modes m=2 and m=1 respectively. The signature vector of these codes are σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2=(2,2,2,2,2,2) and σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5=(2,2,2,2,2,3).
Remark 2**.**
From the definition of message matrix in (8), the entries \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Dx,I with x>maxI are parity symbols and entries with x≤maxI are raw data file symbols. This is because if x>maxI, then x∈/I and thus \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Dx,I=(−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(x)wx,I∪{x}. Also, since x is the largest element in the w-group of Y=I∪{x}, wx,I∪{x}=wmaxY,Y will correspond to a parity symbol as defined in (5). Similarly, one can argue that for x≤maxI, the symbol at position (x,I) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D is either a v-symbol or a non-parity w-symbol.
Remark 3**.**
The construction of signed determinant codes decouples the parameters of the code. The encoder matrix only depends on n and d and remains the same for all modes. On the other hand, the message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D is fully determined by parameters (d,m), and does not depend on n, the total number of nodes in the system. Thus, we refer to the code defined above as a (d;m) signed determinant code and to matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D as a (d;m) message matrix.
Next, we review data recovery and exact repair properties for signed determinant codes.
III-B Data Recovery of (n,d=k,d;m) Determinant Codes
Proposition 1**.**
In a (d;m) signed determinant code, all the data symbols can be recovered from the content of any k=d nodes.
The proof of this proposition is similar to that of Proposition 1 in [41], and hence omitted for the sake of brevity.
III-C Node Repair of (n,k=d,d;m) Determinant Codes
Consider the repair process of a failed node f∈[n] from an arbitrary set H of ∣H∣=d helpers. The repair-encoder matrix for a (d;m) signed determinant code is defined below.
Definition 3**.**
For a (d;m) signed determinant code with signature σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D, and a failed node f∈[n], the repair-encoder matrixΞf,(m) is defined as a (md)×(m−1d) matrix, whose rows are labeled by m-element subsets of [d] and columns are labeled by (m−1)-element subsets of [d]. The element in row I and column J of this matrix is given by
[TABLE]
where ψf,y is the entry of the encoder matrix Ψ at row f and column y. Also, σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(y)’s are the same signature values used in (8). Note that these repair-encoder matrices can be fully generated from the encoder matrix, and do not depend on the node contents.
In order to repair node f, each helper node h∈H multiplies its content Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D by the repair-encoder matrix of node f to obtain Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D⋅Ξf,(m), and sends the result to node f. The required repair-bandwidth of this repair scheme is given in the following proposition.
Proposition 2**.**
The matrix Ξf,(m) defined in (11) has rank βm=(m−1d−1). Therefore, even though the number of entries in vector Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D⋅Ξf,(m) is (m−1d), it can be fully delivered to the failed node by communicating only βm=(m−1d−1) symbols in Fq from node h to node f.
We refer to [41, Proposition 3] for the proof of Proposition 2.
Upon receiving d vectors {Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D⋅Ξf,(m):h∈H} and stacking them into a matrix, the failed node obtains
[TABLE]
where Ψ[H,:] is a d×d full-rank sub-matrix of Ψ, obtained from stacking rows indexed by h∈H. Multiplying both sides by Ψ[H,:]−1, the failed node retrieves
[TABLE]
which is called the repair space of node f. One instance of the repair space for a (d;m)=(6,4) code is presented in (29) in Example 9. Recall that the content of node f is represented by the row vector Ψf,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D which its entries have the same labeling as columns of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D, i.e m element subsets of [d]. All the coded symbols in node f can be recovered from its repair space as described in the following proposition.
Proposition 3**.**
The coded symbol at index I of node f can be recovered from Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D) defined in (12) using
[TABLE]
This proposition is very similar to Proposition 2 in [41]. However, due to the modification introduced by the signature vector here, we present the proof of the current proposition in Appendix B for the sake of completeness.
Remark 4**.**
Note that a signed determinant code can be defined over any Galois field. In particular, for a code designed over GF(2s) with characteristic 2, we have −1=+1, and hence, all the positive and negative signs disappear. Especially, the signs in (8) can be removed and the parity equation in (5) will simply reduce to ∑y∈Iwy,I=0. Also, the non-zero entries of the repair encoder matrix in (11) will be ψf,x, and the repair equation in (13) will be simplified to [Ψf,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D]I=∑x∈I[Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D)]x,I∖{x}.
IV (n,k,d;μ) Cascade Code Construction
In this section, we describe the construction of the cascade codes. For a fixed set of system parameters (n,k,d;μ), the code parameters (α,β,F) of this construction are given in (3). Similar to the construction of (n,k=d,d;m) determinant codes, an (n,k,d;μ) cascade code is also presented by a matrix Cn×α whose i-th row includes the coded content of the i-th node. This Cn×α matrix is obtained by multiplying an encoder matrixΨn×d by a super message matrixMd×α. We first explain the construction of the encoder matrix and then give an overview of the cascade structure of the super message matrix. The details of (n,k,d) super message matrix are discussed in Section V. A summary of the symbols and notations used throughout this construction is given in Table II, as a reference to facilitate following the code construction.
IV-A The Encoder Matrix
Definition 4**.**
The encoder matrix Ψ for a code with parameters (n,k,d) is defined as an n×d matrix
[TABLE]
such that
(E1)
any k rows of Γ are linearly independent; and
2. (E2)
any d rows of Ψ are linearly independent.
Note that Vandermonde matrices satisfy both properties. Similar to determinant codes, the super-message matrix of the cascade codes will be multiplied (from left) by an encoder matrix to generate the coded content of the nodes. Also, the condition in (E1) is an additional requirement in comparison to the encoder matrix of (k=d)-signed determinant codes.
In the following, we define a specific class of codes, called semi-systematic codes, and their corresponding encoder matrix.
Definition 5**.**
We call a signed determinant (or cascade) code semi-systematic code if the contents of the first k nodes are identical to the k rows of the message matrix (or super message matrix), i.e. if Ci,:=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Di,: (or Ci,:=Mi,:) for every i∈[k]. The encoder matrix Ψ of a semi-systematic code should consist of a k×k identity matrix at its upper-left corner and a k×(d−k) zero matrix in its upper-right corner. We also refer to such an encoder matrix as semi-systematic encoder. 555Note that the symbols of the message matrix are not necessarily raw symbols from the file, due to the parity equation in (5) as well as injections. Hence, we rather call these codes semi-systematic to distinguish them from the standard notion of systematic codes.
In general, any matrix satisfying conditions (E1) and (E2) can be converted to a semi-systematic encoder matrix as it is discussed in Appendix C. For the code construction of this paper, the encoder matrix doesn’t need to be semi-systematic, and we prove the properties of the code for the general encoder matrices defined in (4). However, to demonstrate the main idea of the code construction, we use semi-systematic encoders.
IV-B An Overview: Cascading Message Matrices of Determinant Codes
The general structure of super message matrix M for an (n,k,d;μ) cascade code is presented in Fig. 3. This matrix is obtained by concatenating the message matrices of multiple (n,d,d;m) signed determinant codes with different modes, ranging from m=0 to m=μ. The number of required message matrices for each mode m is denoted by tm. Then, this super-message matrix will be multiplied (from left) by an encoder matrix Ψ to generate the node contents. Therefore, the codewords (the content of the nodes) will also be a collection of the codewords of the signed determinant codes used as building blocks. The following definition is what we use to refer to each of these message matrices
Definition 6**.**
We refer to the determinant code message matrices that are concatenated to form a super-message matrix as code segments. Similarly, the ultimate codewords of a cascade code comprise of multiple codeword segments, each corresponding to the multiplication of the encoder matrix by one code segment.
The construction starts with the message matrix of an (n,d,d) system with mode m=μ called the root of the cascade code. There is only one message matrix (code segment) of this mode (tμ=1), and all other message matrices have modes less than μ. These message matrices need to be modified such that the code generated from multiplication of the encoder to the ultimate super message matrix can provide data recovery from any k nodes and also enables the exact-repair property from any d helper nodes.
To demonstrate the main idea, we can assume the encoder matrix Ψn×d is semi-systematic, and thus the content of node i∈[k] is the same as the i-th row of the matrix M. Now, consider an attempt for data recovery from the first k nodes, and assume \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is a code segment of M with mode m1. It is clear that without making any modification, some symbols in rows [k+1:d] of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P cannot be recovered. This is due to the fact that we are performing data recovery only from the first k<d nodes, and these symbols do not appear (coded or uncoded) in the content of the top k nodes. We refer to such symbols as missing symbols.
In order to overcome the data recovery challenge, one can leverage the parity symbols located in the top k rows of a message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, associated with another signed determinant code and mode m2, where m2<m1. Recall from Remark 2 that a symbol wmaxY,Y in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q with ∣Y∣=m2+1 is a parity symbol located in position (maxY,Y∖{maxY}) of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q. If Y⊆[k], all the symbols of the w-group Y (defined in Definition 1), including the parity symbol, will be located in the upper k rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, and hence, appear in the content of the first k nodes used for data recovery. This will make the symbol wmaxY,Y redundant in the data recovery, as it can be also obtained from the other w-group symbols of Y, i.e. {wi,Y:i∈Y,i<maxY}, and using the parity equation in (5). Therefore, by adding (a possibly signed copy of) a missing symbol z from the bottom [k+1:d] rows of the massage matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P to symbol wmaxY,Y of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, we can provide a backup for this missing symbol. With this, the entry at position (maxY,Y∖{maxY}) of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q will be wmaxY,Y±z. As the parity part wmaxY,Y can be independently recovered from the parity equation, one can remove it from the entry and obtain a copy of the missing symbol z. Based on this description for providing backups for missing symbols, we formally define some terms in the following definition, that will be used in the rest of this paper.
Definition 7**.**
We define the following terms, which will be used in the construction of cascade codes:
•
Symbol injection* is referred to the process of providing backup copy(s) for missing symbols, which is simply adding a signed version of them to the parity symbols of (some of) other message matrices.*
•
Parent matrix/code (of an injection)* is referred to the message matrix of the signed determinant code whose symbols in the lower part need protection, and is hence injected.*
•
Child matrix/code (of an injection)* is the message matrix into which symbols are injected.*
For the injection of a specific missing symbol of a parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, we have to carefully choose the child code and the parity symbol of the child code into which the injection occurs. The main challenge is to introduce such backup copies such that they do not demolish the repair process. Note that from Proposition 3, each of the code segments used in the super message matrix M is repairable before injection (modification). However, after modification they are no longer repairable by the standard repair process of (n,d,d;m) codes. Therefore, a careful design for the injection and a revised repair process are needed that can simultaneously guarantee data recovery and node repairability.
In order to protect the missing symbols of the root, some child matrices need to be introduced. The missing symbols of the newly introduced child code should also be backed up by injecting into other determinant codes with lower modes. This leads to a cascade of determinant codes. This construction process continues until it reaches signed determinant codes that do not need a further injection.
Although the main idea of injection is inspired based on a semi-systematic encoder and data recovery from the top-k nodes, we will prove that, with no further modification, the same super message matrix (after injection) leads to a cascade code that provides data recovery for any choice of k nodes.
The details of the construction of the cascade super message matrix are discussed in the following section.
V Super message matrix of (n,k,d) codes
In this section, the details of the cascade structure of the super message matrix of an (n,k,d) code are given in several steps. In order to understand the details and ideas of code construction, we present a running example for an (n,k=4,d=6;μ=4) code with parameters (α,β,F)=(81,27,324), as indicated by (3). The construction of this code instance is also broken into several steps, according to the details of the general code construction. Note that a code with parameters (α,β,F)=(81,27,324) is indeed an MSR code, since F=kα and β=α/(d−k+1), for which several code constructions are known (e.g. [34, 30, 33, 31, 32, 29]). Nevertheless, the
(unnormalized) code parameters, such as sub-packetization level, of the existing codes will grow by increasing the number of nodes, n, in the system. However, in the code construction of this paper, the sub-packetization level is independent of n. Also, cascade codes enable the flexibility of system expansion by adding new parity nodes, without changing the contents of already existing nodes.666The only limitation for adding new nodes to the system is the field size q, that has to satisfy q>n.
In a nutshell, the code construction is a recursive algorithm that creates a rooted tree similar to that of Fig. 4. In this tree, nodes represent message matrices of different determinant codes (code segments) that are concatenated to form the super message matrix. The choice of the root of this tree will be discussed in Subsection V-A. In Subsection V-B, we explain that which symbols of a signed determinant code (a parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P associated to a node in the tree) need to be protected by injecting into another code (the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, corresponding to another node in the tree, which lies under777We may exchangeably use child/parent matrix or node when we discuss the hierarchical tree \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P). Next, in Subsection V-C all child nodes (matrices) of the parent node (matrix) will be identified in Remark 5, and then mode, signature vector, and modifications of each child’s message matrix are explained. Finally, Subsection V-D completes the details of the construction of the super message matrix. To conclude this section, we present Algorithm 1 that summarizes and incorporates all steps of the code construction.
V-A Root of the Cascade Code
In order to construct the message matrix of an (n,k,d;μ) code, we first generate a massage matrix for an (n,k′=d,d;m=μ) determinant code of mode m=μ and we choose an all-zero vector (i.e., σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0(x)=0 for every x∈[d]) as the signature vector of this code.
Example 1**.**
Our goal is to generate an (n,k=4,d=6;μ=4) cascade code, with parameters (α,β,F)=(81,27,324). To this end, we start with a determinant code (d=6;m0=4) and denote its message matrix by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. We choose the signature vector σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0=(0,0,0,0,0,0). The size of this message matrix is d×αm0=d×(m0d)=6×(46)=6×15. The message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 of this determinant code is given in (14) in the next page. We will need to define matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T1,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2,⋯,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T14 later to complete the code construction. We distinguish the entries of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Ti by the superscript ⟨i⟩.
Note that horizontal line in the matrix separates the top k=4 rows from the bottom (d−k)=6−4=2 rows. We will refer to the top 4×15 sub-matrix by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, and to the bottom 2×15 sub-matrix by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. In the representation of this matrix, some symbols are designated by a frame around them, with different background colors, which will be explained in the next subsections.
[TABLE]
∎
V-B Grouping of Symbols
Next, we need to group the missing symbols at the lower (d−k) rows of a determinant code. This is due to the fact that the symbols in each group will be treated differently. To further elaborate on this, we need the following definition.
Definition 8**.**
Consider an (n,d,d;j) determinant code at mode j with message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Recall from (8) that, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,J denotes the entry in the row x and column J with ∣J∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P)=j. For an entry \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,J in the lower (d−k) rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, we have x∈[k+1:d], and set J can be partitioned into two disjoint sets A=J∩[k] and B=J∩[k+1:d], where ∣A∣+∣B∣=∣J∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P). The entries in the lower (d−k) rows will be classified into three disjoint groups as:
We treat symbols in the above-mentioned three groups as follows:
•
Symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) are raw data symbols that need to be protected by injecting into a redundant parity symbol located in the top k rows of child message matrices with lower modes.
•
Symbols in G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) will be set to zero (nulled). This yields a reduction of Nj=G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) in Fj, the number of raw data symbols in the message matrix.
•
Symbols in G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) are essentially parity symbols. This is because for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B∈G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P), we have x>maxB=max(A∪B)=maxJ, and from Remark 2, this is a parity symbol. This symbol can be recovered using the parity equation in (5). More precisely, we have
[TABLE]
which can be evaluated after once symbols in the summation are recovered. Therefore, there is no need to provide backups for the symbols in G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P).
In summary, every symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B located at the bottom [k+1:d] rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is injected into another determinant code if and only if it is not a parity symbol and A=∅.
Example 2**.**
According to Definition 8, the symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 can be grouped as follows:
[TABLE]
The symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0) are marked in boxes in (14), to indicate that they need to be injected into other code segments with lower modes. Since G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0)=∅, no symbol will be set to zero. Symbols in G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0) can be recovered from the parity equations, and won’t be injected. These symbols located below the horizontal line, without a surrounding frame in (14).
∎
The details of the injection process are given in the following subsection.
V-C The Injection Process
In the previous part, it was discussed that only symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) need to be protected by being injected into the message matrices of child codes of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Recall that the term injection refers to the addition of a signed copy of a missing symbol of the parent matrix to a parity (redundant) symbol of the child matrix. It is important to note that a single symbol in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) might be injected into several child matrices. One of such injections is called primary, and the rest are called secondary. In fact, in a primary injection, a symbol is injected into one of the parities located in the upper k rows of a child node while secondary injection(s) will be performed into parity symbol located in the lower (d−k) rows of a child matrix. The goal of primary injections is to preserve the data recovery property, while secondary injections are performed in order to preserve the node repair property. In this subsection, we first introduce child nodes of a parent node and then explain the details of primary and secondary injections into a given child node.
To fully cover the details of injection process, this subsection consists of four parts: 1) child matrices of a given parent matrix, 2) mode of a child matrix, 3) signature vector of a child matrix, and 4) modified child matrix after the injection. These parts are explained in the following.
Child matrices of a given parent matrix:
For a given parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, the group G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) will be further partitioned into several subgroups, depending on their pair of (x,B). Recall that for a symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,J, we identified A=J∩[k] and B=J∩[k+1:d]. All the symbols of the form \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) with the same pair of (x,B) will be primarily injected into parity symbols in the top k rows of the same child determinant code. We refer to (x,B) as the injection pair and we denote the message matrix of the child code of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P into which these symbols are injected by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P (or simply \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, whenever its parent matrix and injection pair are clear from the context). The following remark gives a more precise description of child nodes of a parent matrix.
Remark 5**.**
A parent code segment with message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, has several child nodes of the form \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, for all pairs (x,B) satisfying
The above conditions come from the description of G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) in Definition 8. Especially, condition (ii) is equivalent to B being a proper subset of J, and thus ∣B∣<∣A∪B∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) guarantees that for the injecting symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) we have A=∅. Condition (iii) ensures that symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B lies in the lower submatrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, and (iv) rules out the parity symbols.
Note that we defined max∅=−∞, and therefore conditions (iii) and (iv) rule out the choice of B=∅.
In Section VIII, the number of child matrices for a given parent matrix is evaluated based on Remark 5.
Example 3**.**
In Table III, all the injection pairs (x,B) satisfying conditions of Remark 5 for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 of Example 1 are enumerated. Note that each injection pair in the table corresponds to one child matrix.
For the ease of notation, we use \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T1=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T15,{5}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 to refer to the child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 associated with the injection pair (5,{5}). Similarly, child matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T3,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T4,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 are defined in Table III. For a given (x,B), the corresponding child matrix will primarily host symbols of the form [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]x,A∪B∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T). Recall from Definition 8 that we have A⊆[4] and ∣A∣+∣B∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0)=4.
Note that all symbols of G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T) are marked in a box in (14). Moreover, those symbols that will be injected into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 are further distinguished by different background colors (green and orange for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5, respectively.).
∎
Mode of a child matrix:
Recall that the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is introduced for the primary injection of a symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P). In the primary injection, a (signed copy of a) missing symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B will be injected into the parity symbol ±wmaxA,A located in the row i=maxA and column I=A∖{maxA} of the message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Since A⊆[k], we have 1≤i≤k, which means symbol ±wmaxA,A is located in one of the top k rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Moreover, since the columns of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q are labeled by sets of size ∣A∣−1 (see equation (4)), the relation between the mode of the child matrix and the parent matrix for a fixed injection pair (x,B) is given by
[TABLE]
where the second equality is due to the fact that (A,B) provides a disjoint partitioning for I (see Definition 8).
Example 4**.**
Following up from Example 3, the mode of each child matrix
defined in
Table III can be evaluated according to (15):
[TABLE]
∎
Signature vector of a child matrix:
In order to fully characterize a child matrix, it is only left to define its signature vector. For a parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P with signature vector σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P(⋅), the signature vector of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, the child matrix associated with an injection pair (x,B), is given by
[TABLE]
Such a signature vector is chosen in order to enable the exact repair property. This will be more clear once we discuss the repair process in Section VI.
Example 5**.**
Continuing from Example 3, and the fact that σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0(i)=0 for i∈{1,…,6},
the signature vectors for the child matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 of Table III can be obtained from
[TABLE]
and
[TABLE]
∎
Modified child matrix after the injection:
Let \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P be a child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P associated with the injection pair (x,B), into which some symbols of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P will be injected. We use \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P to refer to this matrix before and after injection (modification), respectively.
This modification is modeled by adding an injection matrixΔx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P to the original matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, i.e.,
[TABLE]
Here Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is a matrix with size and row/column labeling identical to those of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, that consists of a signed version of entries of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P which should be injected into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q.
The entry at position (i,I) is given in (18).888Note that for a cascade code constructed over Galois field GF(2s) with characteristic 2 (i.e., −1=+1), the injection equation in (18) reduces to [Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,I∪{i}∪B\mathds1{i>maxI,i∈/B,I∩B=∅}.
[TABLE]
Here, the coordinates of injection satisfy i∈[d] and I⊆[d] with ∣I∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q)=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P)−∣B∣−1. Note that the entries of Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P are non-zero only for certain positions, which correspond to the parity entries of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q into which an injection is performed.
The following remark further elaborates on the above injection matrix.
Remark 6**.**
The following facts hold for the injection matrix given in (18):
The conditions i>maxI, i∈/B, and I∩B=∅ guarantee that sets {i}, I, and B are mutually disjoint, and hence the size of I∪{i}∪B (the column label of the injected symbol) equals mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P).
In other words, we can verify that mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P)=∣I∪{i}∪B∣=∣I∣+∣{i}∣+∣B∣=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q)+1+∣B∣.
2. 2.
The condition i>maxI will additionally guarantee that the entry at position (i,I) of
the matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q is a parity symbol of the child matrix (see Remark 2). Therefore, an injection only occurs in a parity symbol. Note that the entries of Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P are non-zero only for certain positions, which correspond to the parity entries of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P into whom the injection is performed.
3. 3.
The symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,A∪B∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) will be primarily injected into position (maxA,A∖{maxA}) of the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. This can be seen from (18) as
[TABLE]
Example 6**.**
Let us continue with our running construction from Example 1. For an (n,k=4,d=6;μ=4) code, no injection takes place into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, since it is the root node, and hence, we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 is a purely signed determinant message matrix of mode m0=4.
Consider the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T26,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, whose mode m2=2 and signature vector σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 were evaluated in Examples 4 and 5, respectively. This message matrix has (m2d)=(26)=15 columns, and its columns are labeled by subsets of {1,2,3,4,5,6} of size m2=2. The entries of this message matrix before injection are driven from (8) and given in (20).
Again, the horizontal line separates the top k=4 rows in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 from the lower (d−k)=2 rows in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2. Note that symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 are further partitioned into G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2), G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2), and G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2). The two symbols v5,{5,6}⟨2⟩ and v6,{5,6}⟨2⟩ belong to G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2), and hence they are set to zero (see Definition 8). Symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 without a surrounding solid frame belong to G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2), and therefore do not need any protection. Finally, the remaining symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) are designated by a box around them, which need to be injected into child matrices of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2. This will be further discussed in Example 7.
Moreover, some of the symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 (in both \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) are highlighted by a background color without a solid frame around them. This indicates the symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 into which injections from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 occurs. However, an injection into such designated entries is primary if the host symbol is located in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 (above the horizontal line), and will be secondary if the host symbol lies in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 (below the horizontal line).
We denote this message matrix after injection by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2 and the injection matrix by Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. Therefore, we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2+Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. The matrix Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 can be found from (18) and its complete form is given in (21).
[TABLE]
The child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 is hosting symbols {v6,{1,2,3,6}⟨0⟩,v6,{1,2,4,6}⟨0⟩,v6,{1,3,4,6}⟨0⟩,v6,{2,3,4,6}⟨0⟩} as primary injections since they will be added to the parities located in the top k=4 rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2. For instance, the injected symbol at position (4,{1,2}) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 will be [Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]4,{1,2}=v6,{1,2,4,6}⟨0⟩. Moreover, symbols
[TABLE]
are secondarily injected into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 as they will be added to the entries in the bottom rows, i.e., rows indexed by 5 and 6. Each missing symbol requires to be primarily injected, regardless of its secondary injection. Therefore, in the following it is shown that these symbols will be also primarily injected into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5.
Recall from Table III that \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 is introduced as the child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 associated with the injection pair (6,{5,6}). The mode and the signature vector of this child matrix were also evaluated in Examples 4 and 5 as m5=1 and σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5=(2,2,2,2,2,3). Therefore, the corresponding message matrix has (m5d)=(16)=6 columns. Note that the sign for the entries in the sixth row of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 given in (23) is negative, which is due to (−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5(6)=(−1)3=−1.
[TABLE]
Similarly, the parity symbols of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 hosting symbols from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 are designated by a background color without a solid surrounding frame. We denote by \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5 the matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 after injection, and by Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 the injection matrix. Thus, we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5+Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. The matrix Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 is also evaluated from (18) and given in (23).
[TABLE]
Note that every entry of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 of the form [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]6,A∪{5,6} is injected into the symbol wmaxA,A. Here, A can be either of {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, or {3,4} of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5. Also, note that unlike \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 that hosts both primary and secondary injections, all the injections into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 are primary, since there is no injection into its (d−k)=2 lower rows.
∎
V-D Cascading Structure of the Super Message Matrix
Consider a pair of parent and child matrices (\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q), where some of the symbols of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P are injected into parity symbols of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q. The symbols in the lower (d−k) rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q should be also protected. Recall that symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q can be partitioned to G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q), G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q), and G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q), where symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q) need to be injected into another signed determinant code. To this end, we need to introduce child matrices for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, into which these symbols will be injected. Note that a child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q will be indeed a grandchild matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P.
Example 7**.**
Recall the child matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 discussed in Example 6. According to Definition 8, the symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 can be partitioned to
[TABLE]
The symbols in G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) are set to zero, and those in G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) are parity symbols that do not need protection. However, symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) need to be injected into the message matrices of some other signed determinant codes. Therefore, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 will have its own child nodes. There are three child matrices, namely \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T9, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T10, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T11, associated with injection pairs (5,{5}), (6,{6}), and (5,{6}), respectively. The modes of these codes are given by mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T9)=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T10)=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T11)=2−1−1=0. In particular, the message matrix and injection matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T11 corresponding to the injection pair (5,{6}) are given in (25). Recall that the message matrix of a determinant code of mode zero is an all-zero column vector as it was explained in the definition of message matrix in (8).
[TABLE]
On the other hand, the symbols in the lower part of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 can be partitioned to
[TABLE]
The symbols in G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5) are marked in (22), and set to zero. Recall that only symbols in the first group need to be injected, and since G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5)=∅, no further child matrix is needed for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5.
∎
Remark 7**.**
As mentioned before, we may need multiple child matrices for a parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P to protect all symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P). However, it is worth emphasizing that for each child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, there is only one parent matrix, whose symbols are injected into it. This leads to an injection hierarchy which can be represented as a tree structure (see Fig. 4). The ultimate super-message matrix will be obtained by concatenation of all modified (after injection) message matrices. ⋄
Note that the process of injection will eventually terminate since the mode of a child code is strictly less than that of its parent code, and modes are limited to non-negative integers.
Remark 8**.**
Consider a w-symbol in a child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q into which a symbol from the parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is injected. From Remark 6, this symbol either appears in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q, or it belongs to G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q) if it lies in the lower part of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q. Therefore, the host symbol in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q does not need to be injected into a child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q (i.e., a grandchild matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P). The implication is that a w-symbol in a child matrix hosting a symbol from the parent matrix is never injected into a grandchild matrix, and therefore injected symbols will not be carried for multiple hops of injections. Consequently, the specification of injections form a parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P into a child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q does not depend on whether \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is already modified by an earlier injection or not, that is,
[TABLE]
⋄**
Example 8**.**
Continuing with our running example, here we complete the construction of the super message matrix for a cascade code with parameters (n,k=4,d=6;μ=4). Recall that we started (in Example 1) with a root code segment which is a determinant code \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 of mode m0=μ=4. This root message matrix has five child matrices that were introduced in Example 3. The child matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T1,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T3 have their own child matrices. For instance, the child matrices of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 are discussed in Example 7.
There will be a total of 15 code segments, namely \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T1,⋯,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T14, required to perform all injections. Each code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Ti will be modified by injection to obtain \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Ti, for i=0,1,…,14. Fig. 4 depicts the tree structure of the hierarchy of injections. The tree consists of one node for each code segment. The labeled arrows in the figure indicate the injection pair from the parent to the child matrix.
The mode of each child matrix is evaluated using (15), from the mode of the parent matrix and size of B in the injection pair (x,B). Following this rule, the modes of the code segments are evaluated as
[TABLE]
where mi=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Ti).
The signature vector of each child node can be also obtained using the signature vector of the parent matrix and according to (16).
Finally, the super-message matrix of the cascade code, M, will be formed by concatenation of message matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T1,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2,…,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14, as shown in Fig. 5.
Note that this matrix has d=6 rows, and α=∑i=014(mi6)=1×(46)+3×(26)+2×(16)+9×(06)=81 columns. ∎
The code construction presented in this section may look sophisticated at first glance. However, it has a simple and recursive nature, which facilitates its implementation. All stages of the super-message matrix construction are efficiently summarized in Algorithm 1, in the next page.
VI The Exact Repair Property
In the following, we discuss the exact repair property by introducing the repair data sent in order to repair a failed node f∈[n] from a set of helper nodes H⊆[n]∖{f} with ∣H∣=d.
The repair process is performed in a recursive manner from top-to-bottom, i.e., from segments of the codeword of node f with the highest mode to those with the lowest mode.
The repair data sent from a helper node h∈H to the failed node f is simply formed by the concatenation of the repair data for each code segment. The repair data for each code segment can be obtained by treating each segment as an ordinary signed determinant code. More precisely, helper node h sends
[TABLE]
where the union is taken over all message matrices \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q appearing in the super-message matrix M, the product Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is the codeword segment of node h corresponding to code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q, and Ξf,(mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)) is the repair-encoder matrix for node f for a code of mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q), as defined in (11). In other words, for each codeword segment, the helper node needs to multiply this codeword segment by the repair encoder matrix of the proper mode, and send the collection of all such multiplications to the failed node. Recall from Proposition 2 that the rank of Ξf,(mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)) for a code segment of mode m=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q) is only β(m)=(m−1d−1). The total repair bandwidth of the code can be obtained by summing up the repair bandwidth of all the code segments, and is evaluated in (85) in Section VIII.
Upon receiving all the repair data ⋃\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q{Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q⋅Ξf,(mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)):h∈H}, the failed node stacks the segments corresponding to each code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q, to obtain
[TABLE]
from which, the repair spaces of code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q can be retrieved as
[TABLE]
Recall that Condition (E2) guarantees that matrix Ψ[H,:] is invertible. Also note that the above repair space is defined for the code segment after injection, while the one defined in (12) is for the raw determinant code (before injection). Therefore, the relationship between the two repair spaces is given by
[TABLE]
Where Δ is the injection matrix to code matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q from its parent matrix. Thus, the repair process cannot be performed as indicated in (13). However, having the repair spaces for all the code segments, the content of node f can be reconstructed according to the following proposition. Again, it is worth emphasizing that the codeword segment corresponding to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is represented by a row vector whose entries have the same labeling as columns of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q that are subsets of [d] of size m=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q).
Proposition 4**.**
In an (n,k,d) cascade code introduced in sections IV and V, for every failed node f∈[n] and a set of helpers H⊆[n]∖{f} with ∣H∣=d, the content of node f can be exactly regenerated from the received repair spaces.
More precisely, the symbols at position I in a codeword segment corresponding to a code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q will be repaired through999Note that for a code in Galois field GF(2s) with characteristic 2, the repair equation reduces to[Ψf,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]I=∑i∈I[Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)]i,I∖{i}+[Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P)]x,I∪B\mathds1{I∩B=∅}
[TABLE]
where \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P is the parent matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q, and (x,B) is the corresponding injection pair.
Note that for positions I that have an overlap with B (the second injection parameter), the repair identity above reduces to (13), and the repair will be performed as in an ordinary signed determinant code. This is due to the fact that no symbol is injected to any position in column I of message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q, as indicated in (18), and hence the symbol [Ψf,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]I is not affected by the injection process. However, for a position I which is disjoint from B, the coded symbol at position I of the codeword stored at node f has an interference caused by the injected symbols. However, this interference can be canceled using the repair space of the parent matrix, as indicated in (27).
In the following, first the repair property is explained for the (n,k=4,d=6) code from our running example, and then the formal proof of Proposition 4 is provided.
Example 9**.**
In the (n,k=4,d=6) example of the previous section, assume a node f is failed, and its codeword needs to be repaired using the repair data received from a set of helper nodes, say H with ∣H∣=d=6. The cascade code, as it is indicated in Fig. 4, has 15 segments, which will be repaired in a sequential manner, from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14. Each helper node h∈H computes and sends ⋃i=014{Ψh,:⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Ti⋅Ξf,(mi)} to node f, where mi is the mode of code \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Ti, e.g., m0=4.
The process starts from the first codeword segment (corresponding to the code segment with mode μ=m0=4, located at the root of the hierarchical tree) of the failed node f.
Note that no symbol is injected into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 and hence we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, and the repair of the first α(4)=15 symbols of node f is identical to that of a signed determinant code, as described in Proposition 3. For the sake of demonstration, we focus on the repair of the symbol at position I={1,2,3,6} of the codeword segment corresponding to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 of the failed node, i.e.,
[TABLE]
Following from (13), and since Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0)=Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0), for the repair of the normal signed determinant code at I={1,2,3,6} we have
[TABLE]
Note that matrix Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0) has (36)=20 columns, and cannot be fully written here. However. the submatrix of Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0) corresponding to the columns with labels in Y={{1,2,3},{1,2,6},{1,3,6},{2,3,6}} is given in (29).
As it can be seen, the v-symbols are repaired directly, while the w-symbols are repaired using the parity equation in (5). Other symbols of the code segment corresponding to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 can be also repaired in a similar manner. Once the segment corresponding to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0 is repaired, we can proceed with the codeword segments corresponding to the child matrices in the injection hierarchy.
Let us focus on the repair of the segment corresponding to the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2. In particular, we focus on the symbol at position I={2,3}. The missing symbol at this position (which should be regenerated by the repair process) is given by
[TABLE]
Note that symbols v6,{2,3,4,6}⟨0⟩ and v6,{2,3,5,6}⟨0⟩ are injected from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2.
Upon receiving the repair symbols, node f recovers Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2)=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2⋅Ξf,(2), where Ξf,(2) is defined in (11).
Evaluating (27) for I={2,3}, we first obtain
That means the simple repair strategy of (signed) determinant codes given in (13) cannot be directly applied to repair a this symbol. However, note that all the three terms in the difference given in (32) are symbols of the code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. Also, recall that \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 is the parent matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2. The plan is to compute the RHS of (32) from the repair space of the parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0, and subtract it from (31), to recover the missing symbol in (30). Recall that the injection pair for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 is (x,B)=(6,{6}). Interestingly, the entry at position (x,I∪B)=(6,{2,3,6}) of Rf(T0) is given by (see (29))
[TABLE]
which is exactly identical to the difference in (32). This term −[Rf(T0)]6,{2,3,6} is actually the second term in (27) for the case when the intersection of I and B is non-empty. Therefore, we have
[TABLE]
This provides an exact recovery for the failed symbol in (30). A similar procedure can be used to repair all the other symbols and codeword segments of the failed node f.
Let f be a failed node, and its content needs to be repaired using the repair data received from the helper nodes in H with ∣H∣=d. The content of node f will be reconstructed segment by segment.
Consider a code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, that is a determinant code with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)=m and injection pair (x,B), into which symbols from its parent code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P)=j>m, are injected. Recall that the corresponding code segment matrix can be written as
[TABLE]
where the first term is a signed determinant code and the second term indicates the contribution of injection. For a given position I within this codeword segment, i.e, I⊆[d] and ∣I∣=m, the corresponding symbol of the failed node is given by
[TABLE]
As it is clear from (27) that the repair of segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q of the codeword of node f will be performed similarly to that of the determinant codes using the repair space Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q⋅Ξf,(m), together with additional correction from the repair space of the repair of the parent matrix, that is Rf(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P)=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P⋅Ξf,(j). Note this latter correction will take care of the deviation of the code segment from the standard determinant code, which is caused by the injection of the symbols from the parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q.
We start with the first term in the right-hand-side of (27), which is
[TABLE]
where (35) holds due to the linearity of the operations, and in (36) we used Proposition 3 for the repair process of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, which is a signed determinant code with parameters (d,m) and the signature vector σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q. Therefore, from (34) and (36), we can conclude that proving the claimed identity in (27) is equivalent to show
[TABLE]
where
[TABLE]
Note that all the data symbols appearing in (37) belong the parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. We can distinguish the following two cases in order to prove (37).
Case I: I∩B=∅:
Starting from Term1 we have
[TABLE]
where (39) follows the definition of the injection symbols in (18) which implies a non-zero injection occurs at position (y,I) only if y>maxI and y∈/B; (40) holds since [maxI+1:d]∩I=∅; and in (41) we plugged in the entries of Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P from (18).
Next, Term2 can be expanded as
[TABLE]
Note that in (42) we have used the definition of matrix Ξf,(m) given in (11), that implies the entry in position (L,I∖{i}) is non-zero only if L=(I∖{i})∪{y} for some y∈/I∖{i}. Moreover, (43) follows from the definition of injected entries in (18), which implies the entry of Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P at position (i,(I∖{i})∪{y}) is non-zero only if all the following conditions hold:
[TABLE]
These together with the fact that the outer summation is taken over i∈I imply i=maxI. Moreover, the inner summation over y∈[d],y∈/I∖{i} reduces to a summation over y’s satisfying y<i=maxI and y∈/I∪B, or simply
y∈[maxI]∖(I∪B) as indicated in (43). Finally, in (45) the matrix entries are replaced from their definitions from (11) and (18).
Next, we simplify the overall sign in (45). First, for every y<maxI we have
[TABLE]
where (46) is due to the definition of the child matrix’s signature in (16); equality in (47) holds since I,B, and {y} are disjoint sets; in (48) we used the fact that y<maxI. Similarly, we can write
[TABLE]
where in (49) we used the definition of the child matrix’s signature vector in (16); equality in (50) follows the fact that y<maxI; and (51) holds since I, {y}, and B are disjoint sets.
Lastly, since mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P)=j, we have
[TABLE]
where in (54) we used the definition of matrix Ξf,(j) in (11) that implies the entry in position (L,I∪B) is non-zero only if L=I∪{y}∪B for some y∈/I∪B, in (55) we plugged in the entry at position (I∪{y}∪B,I∪B) of Ξf,(j) using (11). Moreover, (56) holds since no injection occurs at position (x,I∪{y}∪B) of matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. To see this, note that (x,B) is a valid injection pair from parent \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P to the child \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q This means that by Condition (iv) of Remark 5, (x,B) should satisfy x≤maxB. On the other hand, if \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,P is hosting any injected symbol at position (x,I∪{y}∪B) from its own parent, based on (18), the symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,I∪{y}∪B must be a parity symbol and the relation x>\max\Big{[}\mathcal{I}\cup\left\{y\right\}\cup\mathcal{B}\Big{]} should hold. This yields the fact that x>\max\Big{[}\mathcal{I}\cup\left\{y\right\}\cup\mathcal{B}\Big{]}>\max\mathcal{B} which is in contradiction with x≤maxB. Therefore, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Px,I∪{y}∪B is not hosting any injected symbol.
Putting (41), (53), and (56) together, we get
Case II: I∩B=∅:
Similar to case I, we can expand Term1 to get the summation in (39). Then each term in (39) consists of an entry from Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P at position (y,I), which is zero for I∩B=∅. Therefore we have Term1=0.
Similarly, Term2 can be expanded to the summation given in (42). However, based on (18), the entry of Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P at position (i,(I∖{i})∪{y}) is non-zero only if ((I∖{i})∪{y})∩B=∅ and i∈/B. This implies
[TABLE]
This contradiction implies that all the terms in (42) are zero, and thus Term2=0. Finally, Term3 is zero whenever I∩B=∅ as it is defined in (38). Therefore, the
identity (37) clearly holds.
This completes the proof of Proposition 4.
∎
VII The Data Recovery Property
In this section, the data recovery property of the proposed code is discussed. This property guarantees the availability of the storage system in spite of up to n−k failures, meaning that the original data file can be recovered from any subset of k nodes among n nodes. The following proposition is a formal statement for this property.
Proposition 5**.**
Consider an arbitrary subset K⊆[n] with ∣K∣=k. In a distributed storage system with an (n,k,d;μ) cascade code and parameters defined in (3), all of the F(k,d;μ) information symbols can be recovered from the coded data stored in the nodes indexed by i∈K.
We provide an algorithmic proof for the above proposition. The general description of the data recovery algorithm is discussed in Subsection VII-A. Then, the full data recovery process is presented in Algorithm 2. Next, in Subsection VII-B the formal proof of Proposition 5 is presented which also explains how each part of the data recovery algorithm functions. Finally, in Subsection VII-C the details of data recovery are explained for the running example of this paper.
VII-A The Data Recovery Algorithm
Consider an arbitrary subset of nodes K⊆[n] with ∣K∣=k. Let Ψ[K,:] denote the k×d sub-matrix formed by collecting the corresponding k rows of Ψ. The collection of coded symbols stored in these k nodes can be written as Ψ[K,:]⋅M. The goal is to recover the original super message matrix M from the observed data Ψ[K,:]⋅M. Unlike the (n,k=d,d) signed determinant codes, here the encoder matrix Ψ[K,:] is not square, and hence is not invertible. So, one cannot simply multiply the stacked data by the inverse of the encoder matrix to retrieve M from Ψ[K,:]⋅M. Nevertheless, using the properties of the encoder matrix in Definition 4, the matrix Ψ[K,:] can be decomposed into
[TABLE]
where Γ[K,:] is a k×k invertible matrix.
Recall that the super message matrix M is formed by the concatenation of several (after injection) message matrices. Therefore, the matrix Ψ[K,:]⋅M consists of several codeword segments, each of the form Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, where \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is a segment of M operating at mode m=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S), \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is the message matrix of the parent code of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S, and (y,Y) is the injection pair used to generate \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S. The matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S is a d×αm=d×(md) matrix and can be partitioned into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S, corresponding to the top k and bottom (d−k) rows of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S, respectively. Therefore, after multiplying Ψ[K,:]⋅M by Γ[K,:]−1, for codeword segment corresponding to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S we get
[TABLE]
In this section, we explain how we first recover the bottom matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S separately. Then we compute Γ[K,:]−1⋅Υ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S and use it to get a copy of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S from (59). Recall that the goal of the symbol injection introduced in this paper is to provide data recovery. Therefore, some symbols of the bottom matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S may be retrieved from child matrices of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S. This will impose an order for the data recovery, where we start by recovering the data symbols of the segments at the lowest level of the hierarchical tree (with the smallest mode) and proceed to segments with higher modes, until we reach to the code segment at the root of the hierarchical tree. Hence, data recovery is considered as a bottom-to-top process.
Example 10**.**
In the (n,k=4,d=6;μ=4) example of Section V, consider the data recovery from an arbitrary subset of k=4 nodes say K={1,3,6,7}. In this example, the data collector observes an encoded matrix Ψ[K,:]⋅M=[Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T1⋯Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14]. The goal of data recovery is to extract a complete copy of
[TABLE]
We recover segments starting from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14 and finish the decoding at \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0. Now, for instance in the data recovery of coded segment Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2 and for column I={2,3} the observed data by the data collector is given by (see (20) and (21))
[TABLE]
In the above equation, [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]:,{2,3} cannot be extracted from Ψ[K,:]⋅[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]:,{2,3}, since Ψ[K,:] is not square/invertible. However, we will explain how the symbols {w5,{2,3,5}⟨2⟩,v6,{2,3,5,6}⟨0⟩,w6,{2,3,6}⟨2⟩} can be recovered from another piece of collected data, and then the top rows of [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]:,{2,3} can be computed using (59), that is,
[TABLE]
∎
The major steps of data recovery are described below and the full process is presented in Algorithm 2.
(St.1)
The segments are decoded from segments with mode [math] to those with higher modes, until the algorithm reaches the root of the tree (Loops 1 and 2 in Algorithm 2).
2. (St.2)
Within each mode and each code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P, the columns are decoded according to the reverse-lexicographical order (DecodeSegment procedure in Algorithm 2).
3. (St.3)
For each column of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P labeled by I, entries located in the lower (d−k) rows are decoded first
(DecodeColumnBottom procedure in Algorithm 2). To do so, recall that a given symbol [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I is made from the original and injected part, i.e.,
[TABLE]
The decoding strategy adopted for each symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I depends on the group that the symbol belongs to (see Definition 8):
(St.3.1)
If \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S) then \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I, and [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=0. This symbol will be decoded using the child matrices of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S who have lower modes, and hence are already decoded before \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S.
2. (St.3.2)
If \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S) then \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I=0. Moreover, no symbol from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is injected into this entry, i.e., [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=0. Therefore, we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I=0.
3. (St.3.3)
If \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S) then the original part \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I and the (potentially) injected part [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I will be decoded separately. To this end, the symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I will be recovered using the parity equation in (5). We will show that the other symbols participating in the parity equation appear in columns \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,J, with J≻I (according to the lexicographical order defined in Subsection I-C). Therefore, based on Step (St.2), all such symbols are already decoded, before column \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I. Next, note that the injection of symbol [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I ( a possible injection from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) is a secondary one. Hence, such a symbol is also primarily injected to a sibling code segment of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S, say \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q (another child code of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P), with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)<mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S). Thus, the injected symbol is already decoded when the message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is decoded.
4. (St.4)
Once the lower part of column I of code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S (i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I) is decoded, its upper part \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I will be recovered using the identity in (59) (Line 2 of Algorithm 2).
5. (St.5)
Once codes segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P+Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is decoded, both \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P and Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P matrices will be extracted (GetDeltaS procedure of Algorithm 2). The matrix Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P will be later used in the recovery of the parent matrix, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P.
In Algorithm 2 of the next page, it is assumed that the structure of the code, including the hierarchical tree which specifies parent-child relationship between nodes and all the injection pairs are known at the data collector. Note that the procedure may look sophisticated at first glance. However, it follows a recursive and identical routine for all code segments.
In the following subsection, we first present the formal proof for the data recovery, which also provides a detailed description of the above instruction. This is followed by applying the data recovery algorithm to our running example in Section VII-C.
VII-B The Proof of Data Recovery
This section is dedicated to the proof of Proposition 5.
Before diving into the formal proof, we present two lemmas and their proofs, which play an important role in the proof of the proposition.
Lemma 1** (Initial Recovery Step).**
In any (n,k,d;μ) cascade code we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S=0(d−k)×1 for every code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S)=0.
Consider a segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S)=0, which is introduced by a parent segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P via an injection pair (y,Y), i.e, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P+Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Recall from Remark 5 that y∈[k+1:d] and Y⊆[k+1:d]. Also note that the only column of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S is indexed by ∅, which is an all-zero vector, i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P=0d×1. According to (18), the symbol of parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P to be injected into position (i,∅) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S is \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,∅∪{i}∪Y=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,{i}∪Y (up to a sign). Now, if [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,∅ belongs to the lower part of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S, we have i∈[k+1:d]. This together with Y⊆[k+1:d] implies that ({i}∪Y)∩[k]=∅, and thus \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,{i}∪Y∈G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) (see Definition 8). Hence, this symbol is set to zero, and will not be injected. Thus, no injections occur at position (i,∅) with i∈[k+1:d], and we have \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Si,∅=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Si,∅=0. This implies the claim of the lemma.
∎
Lemma 2** (Separability of the Injected and Original Symbols).**
If a code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P+Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P
is decoded (all entries of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S are recovered), then the symbols of the original message matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P and the injected symbols in Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P can be uniquely extracted.
First note that if no injection occurs into the position (i,I), then we have [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I, and [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I=0, and the claim clearly holds.
Next, consider a position (i,I) with an injection.
Recall from (18) and Remark 6, that injection takes place into a position (i,I) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S only if conditions i>maxI, i∈/Y and I∩Y=∅ are all satisfied. Let mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S)=m. We start by obtaining a copy of [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I as follows.
[TABLE]
where in (60) we used parity equation in (5) to rewrite wi,I∪{i} based on other symbols of the w-group I∪{i}; in (61) we used the fact that i is the maximum of the m+1 element set I∪{i} to conclude that indI∪{i}(i)=m+1 and indI∪{i}(t)=indI(t); the last equality in (62) follows from the fact that t≤maxI<i=max((I∪{i})∖{t}) and hence, according to (18) no injection takes place into [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]t,(I∪{i}). This implies that [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I can be retrieved as a linear combination of some symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P.
Finally, once [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]i,I is recovered, we can find the injected symbol from
We aim to show that all the code segments in M can be recovered using the data recovery process in Algorithm 2. The data recovery is a recursive process,
which starts from the code segments with the lowest modes (at the bottom of the hierarchical tree) and continues to code segments with the highest modes (at the top of the hierarchical tree). Furthermore, data recovery within each code segment is performed column by column, according to the reverse-lexicographical order, i.e., column [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,I will be decoded after column [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,J if an only if I≺J. We also prove the proposition by induction over the code segments and column labels.
As the base step of induction, consider a code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S)=0. Then, from (59) and Lemma 1
we have
[TABLE]
which provides us with \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S. This together with \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S (from Lemma 1 fully recover \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S. Then, using Lemma 2 we can recover the original symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P (which are all zero by definition) and the injected symbols Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P.
It remains to prove the induction step, which is the following statement:
*For any code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S)=m and any column index I,
if the message matrices for all code segments \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q with mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)<m and all columns [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,J with J>I are decoded, then
the content of column [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,I can be extracted from the observed data Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S and the already decoded symbols. *
To prove this statement, consider an arbitrary code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P in M,
and focus on an arbitrary column [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,I.
Let x∈[k+1:d] and consider an entry [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I in the lower part of this column. Recall that
[TABLE]
Now, we can distinguish the following three cases:
∙\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S): From the second part of Remark 6 we know that no injection is performed into the symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S), and hence [Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=0 and [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I
Then, let A=I∩[k], and B=I∩[k+1:d]. Again, the third part of Remark 6 implies that symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I is injected into position (maxA,A∖{maxA}) of the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S. Therefore, we have
[TABLE]
Since \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is a child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S, we have mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)<mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S), and hence by the induction assumption \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is already decoded. Moreover, Lemma 2 ensures that all the entries of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S and Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S can be extracted from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q. Therefore, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I can be recovered using (63) and Δx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S.
∙\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S): Recall from Definition 8 that symbols in this group are set to zero. Moreover, belonging to G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S) implies that x≤maxI, and hence (18) ensures that no injection will be performed into the entry (x,I) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S. Therefore, we have
[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P]x,I=[Δy,Y\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S]x,I=0.
∙\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I∈G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S): Since this symbol belongs to G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S), we have x∈[k+1:d] and x>maxI. Our goal is to decode \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I. First, note that position (x,I) may be hosting a (secondary) injection from the parent matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. According to (18) we have
[TABLE]
Hence, in order to decode \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I we need to find both \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I=(−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S(x)wx,I∪{x} and the injected symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y.
The first term in (64) can be decoded using the parity equation (5). Note that wx,I∪{x} satisfies a parity equation along with {wt,I∪{x}:t∈I}. A w-symbol wt,I∪{x} is located in column J=(I∪{x})∖{t} of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S.
Since t≤maxI<x, the lexicographical order of I and J satisfies I≺J. Therefore, due to the induction assumption, every symbol in column J including \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,St,J is already decoded. Moreover, (18) implies that no injection is performed into position (t,J) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S, since t≤maxI<x=maxJ. This leads to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,St,J=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,St,J=wt,J∪{t}=wt,I∪{x}. Thus we can retrieve \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I from
[TABLE]
Next, we need to retrieve the second term in (64), i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y. We note that the injection of this symbol into code matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S is a secondary injection, since the injection position is in the lower part of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S, which is the child matrix (see Section V-C) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Therefore, the symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y is also primarily injected into another child matrix. By partitioning the column index I∪{x}∪Y into A′=[I∪{x}∪Y]∩[k] and B′=[I∪{x}∪Y]∩[k+1:d], we can find the position and injection pair of the primary injection of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y.
This leads to position (maxA′,A′∖{maxA′}) of the child \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Qy,B′\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P (see part 3 of Remark 6), where \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is a sibling of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S.
The mode of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q satisfies
[TABLE]
where (66) follows from the relation between the mode of child matrix and the injected symbol of the parent matrix for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y given in (15), equality in (68) holds since
x∈[k+1:d] and Y⊆[k+1:d],
and the last equality in (69) holds since I is a columns index of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S.
Now, since mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q)<mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S), the induction assumption implies that the child matrix \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q is already decoded. Moreover, Lemma 2 ensures that \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Q and Δy,B′\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P can be extracted from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Q. Therefore, the entry at position (maxA′,A′∖{maxA′}) of Δy,B′\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P will provide us with
[TABLE]
from which the injected symbol \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y can be recovered. Finally, we can plugin \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Sx,I and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Py,I∪{x}∪Y (obtained in (65) and (70), respectively) in (64), to recover \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Sx,I in group G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,S).
With this, the lower part of column \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I is fully recovered. Then, the upper part can be decoded using (59) and the observed data Ψ[K,:]⋅\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I. Stacking \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I on top of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S:,I, we find the entire column [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,S]:,I.
This completes the proof of the induction step.
Repeating the induction steps for all code segments and all columns leads to the recovery of the entire matrix M. This completes the proof.
∎
VII-C An Illustrative Example for Data Recovery
Example 11**.**
We continue the data recovery for the running example of this paper. First, note that as it was explained earlier in this section, once the bottom part (the bottom n−k rows) of each column is decoded, its upper part (the top k entries) can be easily recovered using (59).
We start with code segments at the lowest level of the hierarchical tree in Fig. 4, i.e., segments with mode zero, namely, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T6,…,\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14. Here, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T6=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T7=⋯=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T14=0. This can be verified for \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T11, using \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T11 and Δ5,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 (that includes the injections into \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T11) given in (25). Hence, we can recover the entire column \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,Ti for i∈[6:14].
We continue the data recovery process with code segments of mode 1. Let us focus on \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5+Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, where \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 and Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 are given in (22) and (23), respectively. We recover the columns of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5 according to reverse-lexicographical order, i.e., we start from {6}, and continue with {5}, {4}, {3}, and {2}, until we get to {1}. For column {6}, both elements in the bottom part belong to G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5), and hence we have elements [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5]5,{6}=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5]6,{6}=0 and
[Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]5,{6}=[Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]6,{6}=0. Therefore, we have [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]:,{6}=0, and
[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]:,{6} can be recovered using (59).
Next, to decode column {5} we first need to find its bottom entries, first note that the entry, [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]5,{5} and [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]6,{5}. It is easy to check that the entry at position (5,{5}) belongs to G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5) and hence,
[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5]5,{5}=[Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]5,{5}=0. The entry at position (6,{5}) belongs to
G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5) and there is no secondary injection into this symbol, i.e, [Δ6,{5,6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0]6,{5}=0. However, we find it is
equals to zero, i.e. [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5]6,{5}=−w6,{5,6}⟨5⟩.
This is because the parity equation in 5 for the parity group {5,6} implies that
[TABLE]
and the latter symbol appears in column {6}, which is already decoded. Thus, we have recovered [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]:,{5} and
[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]:,{5} can be extracted from (59).
For the remaining columns, {4}, {3}, {2}, and {1} we notice that no injection is performed into their lower entries, and the symbols in \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 are all w-symbols, that can be recovered from the parity equation and other w-symbols which are previously decoded. Hence, the data recovery process continues in a similar fashion.
The next step consists of decoding message matrices of code segments with mode m=2, i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T1, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T3. We focus on decoding \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2 for the sake of illustration.
Note that \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2=\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2+Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 and Δ6,{6}\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 are given in (20) and (21), respectively. The symbols in each group of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 are given in (24). In particular, the symbols in G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2)={v5,{5,6}⟨2⟩,v6,{5,6}⟨2⟩} are set zero. Hence, the lower part of column {5,6} is zero, and [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]:,{5,6} can be easily decoded.
The symbols in G1(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) are injected into code segments \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T9, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T10, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T11, and hence, can be retrieved once \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T9, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T10, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T11 are decoded. For instance, consider the entry of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 at position (5,{5,6}), which is w5,{4,5,6}⟨2⟩. For this position we have A={4,6}∩[4]={4}, and B={4,6}∩[5:6]={6}. Therefore, this symbol is injected into the child code of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2 with injection pair (5,{B})=(5,{6}) at row maxA=4 and column A∖{maxA}=∅. That is, [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T11]4,∅, as can verified in (25). Therefore, having \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T11 decoded, we also have w5,{4,5,6}⟨2⟩.
Next, consider column {3,4} of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2, that includes some elements from G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) in its bottom part. In particular, we have [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]5,{3,4}=w5,{3,4,5}⟨2⟩+v6,{3,4,5,6}⟨0⟩, where w5,{3,4,5}⟨2⟩=[\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2]5,{3,4}∈G3(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2) and v6,{3,4,5,6}⟨0⟩ is the symbol injected from \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0. The first ingredient w5,{3,4,5}⟨2⟩ can be decoded from the parity equation for w-group {3,4,5}:
[TABLE]
Note that w3,{3,4,5}⟨2⟩ and w4,{3,4,5}⟨2⟩ appear in columns {4,5} and {3,5}, which are already decoded before we arrive to decoding column {4,5}. In order to retrieve the second ingredient v6,{3,4,5,6}⟨0⟩, we notice that the current injection is secondary, and the same symbol is also primarily injected to some other child code of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0, which is decoded before we arrive to decoding \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2. To find the injection pair and position of such primary injection, we note that v6,{3,4,5,6}⟨0⟩ appears in position (y,J)=(6,{3,4,5,6}) of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T0 (up to a sign). Therefore, we have A′=J∩[4]={3,4} and B′=J∩[5:6]={5,6}. Therefore, the injection pair for the primary injection of v6,{3,4,5,6}⟨0⟩ is (y,B′)=(6,{5,6}) which corresponds to \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T5 (see Fig. 4). The position of injection is (maxA′,A′∖{maxA′})=(4,{3}). Therefore, v6,{3,4,5,6}⟨0⟩ can be found from [\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5]4,{3}. This can be verified from (23). Note that mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T)5=1<2=mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,T2), and hence \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5 is decoded before we arrive to decoding \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T5.
Repeating a similar procedure we can decode all columns of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2. Once all three code segments of mode 2, i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T1, \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T2, and \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T3 are decoded, we can proceed to the recovery of the root code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=black,T0.
∎
VIII The Code Parameters
The construction of the cascade code is described in Section V. However, the finding of the parameters of the resulting code (such as the size of super-message matrix M) requires an explicit evaluation of the number of code segments introduced throughout the injection process.
VIII-A An Implicit Evaluating of the Code Parameters
Consider the construction of an (n,k,d;μ) cascade code. Recall that we start from a signed determinant code of mode μ and may introduce many code segments to complete the injection process. Let tm be the total number of (d;m) code segments of mode m needed to complete all the injections. The super-message matrix M is obtained by concatenating all code segments, which results in a matrix with d rows and a total of ∑m=0μtmαm columns. Therefore, the required per-node storage capacity of the resulting code is
[TABLE]
Similarly, the repair bandwidth of the resulting code is given by
[TABLE]
The total number of data symbols stored in matrix M is the sum of the number of data symbols in each code segment. A code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P of mode m can store up to Fm=m(m+1d+1) symbols. However, recall that data symbols in group G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) (see Definition 8) will be set to zero, which yields to a reduction in the number of stored symbols. For an (n,d,d;m) signed determinant code used as a code segment, the reduction due to nulling the data symbols in G2(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P) can be found from
[TABLE]
Note that in (73), for the first set, we choose a subset of size m from [k+1:d] for B, and any element of B is a valid choice for x. Similarly, for the second set, we choose a subset of size m+1 from [k+1:d] for {x}∪B, and then x can be any element of {x}∪B except the largest one. Finally, we used Pascal’s identity in (74).
Subtracting the number of nulled symbol, the number of data symbols in a signed determinant code of mode m will be Fm−Nm. Therefore, the total number of data symbols in the super-message matrix can be evaluated as
[TABLE]
The rest of this section is dedicated to the evaluation of parameters tm’s, the number of code segments of mode m. Then, we can explicitly characterize the code parameters.
VIII-B The Number of Child Matrices
Consider a code segment \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P of mode(\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P)=j. Recall that each child matrix of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P is a (signed) determinant codes of the form \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Qx,B\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P. Hence, there is a one-to-one map between the child segments of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,P and the injection pairs (x,B) satisfying the conditions of Remark 5. From (15), the mode of child matrix with injection pair (x,B) is given by
[TABLE]
Therefore, the injection pairs leading to a child matrix of mode m can be found from
[TABLE]
We can distinguish two cases for the pairs (x,B) in this set. First, if x∈B, then there are (∣B∣d−k) choices of B, and there are ∣B∣ choices for x∈B. Second, if x∈/B, then B∪{x} is a subset of [k+1:d] of size ∣B∣+1. Therefore, there are (∣B∣+1d−k) choices for B∪{x}. Moreover, for a given B∪{x}, each entry except the largest one can be chosen to be x (and the remaining ones will form B).
Therefore, the total number of injection pairs leading to a child matrix of mode m for a parent matrix of mode j is given by
[TABLE]
VIII-C Recursive Equations for tm Parameters
The code construction starts from a determinant code of mode μ as the root. Hence, we have tμ=1.
Let m be an integer in {0,1,…,μ−1}.
In general, a code segment of mode m can be introduced by for injecting the symbols of any parent matrix of mode j with j>m. For a fixed j, the required number of child matrices of mode m is given by (76).
Note that child matrices are dedicated to their parent matrix, and cannot be shared by multiple parent matrices. Therefore, if there are tj code segments of mode j in the super-message matrix M, then the total number of child matrices of mode m is given by
[TABLE]
This is a (reverse) recursive equation with starting point tμ=1. Next, we solve this recursive equation to obtain explicit expressions for tm’s.
VIII-D The Explicit Evaluation of tm Parameters
Note that we are only interested in the value of tm defined by (77) for m∈{0,1,⋯,μ−1}, and we can assume arbitrary values for tm when m<0 or m>μ.
In particular, we expand the range of m to include all integers, by defining dummy variables {tm:m<0 or m>μ} such that
[TABLE]
Note that this immediately implies tm=0 for m>μ. However, it may lead to some non-trivial (and meaningless) values for tm with m<0.
We also define a sequence {pm}m=−∞∞ as pm=tμ−m for all m∈Z.
The next lemma provides a non-recursive expression for sequence {pm}.
Lemma 3**.**
The parameters in sequence {pm} can be found from
[TABLE]
for 0≤m≤μ.
We refer to Appendix D-B for the proof of Lemma 3.
It is clear that we can immediately find a non-recursive expression for tm using the fact that tm=pμ−m. However, it turns out that it is more convenient to work with sequence pm, without directly evaluating tm.
VIII-E The Explicit Evaluation of the Code Parameters
We need to show that the implicit code parameters obtained in (71), (72), and (75) are equal to those claimed in Theorem 1. The following lemma will be helpful to simplify the derivation. The proof of the lemma can be found in Appendix D-C.
Lemma 4**.**
For integer numbers a,b∈Z, we have
[TABLE]
Now, we are ready to evaluate the code parameters using Lemma 4.
In (82) we used the fact that (md)=0 for m<0 and pm−μ=0 for m>μ, and (83) follows from Lemma 4 for a=b=0. Finally, (83) is exactly the node storage size claimed in Theorem 1.
•
Repair bandwidth: Similarly, we can start from the repair bandwidth in (72) and follow similar steps to simplify it as
[TABLE]
Note that we have Lemma 4 for a=b=−1 in (84), and (85) holds since (0−1k−1)=0. This proves that the repair bandwidth of the proposed code is identical to that claimed in Theorem 1.
•
File size: Starting from the implicit expression in (75), we can write
[TABLE]
Note that (86) holds since tm=0 for m>μ and [(m+1d+1)−(m+1d−k+1)]=0 for m<0, in (87) we used tm=pμ−m and some manipulation m=(m+1)−1, (88) holds since a(ab)=b(a−1b−1), and (89) follows from four times evaluation of Lemma 4 for (a,b)=(0,0), (a,b)=(1,1), (a,b)=(−k,0), and (a,b)=(−k+1,1). The equality in (90) holds since the terms in the third summation in (89) are zero except for m=0, and similarly the terms in the fourth summation are zero except for m=−1,0.
Then, the third and forth terms in (90) get canceled, and we used the identity (m+1k+1)=(mk)+(m+1k) in (91).
In (92), we used the fact that (−1k)=0 and applied a unit-shift on the variable of the last summation.
The second summation over m∈{0,…,μ+1} in (92) is decomposed into m∈{0,…,μ} and m=μ+1 in (93). This leads to (94), which is the storage capacity of the code, as claimed in Theorem 1.
IX Discussion
IX-A Cascade Codes vs. Product-Matrix Codes
In this section, we compare the code construction proposed in this paper to the product-matrix (PM) code introduced in [11]. Since both code constructions are linear, they both can be written as a product of the encoder matrix and a message matrix, i.e., C=Ψ⋅M as the [11]. A natural question to ask is whether the two codes are equivalent at MBR and MSR points (recall that the construction of PM codes is limited to the extreme points, namely, MBR and MSR). In other words, one may wonder if the two codes structurally equivalent, in spite of their different constructions. We give a negative answer to this question, by proving some fundamental differences between the two codes. Here, we rely on the standard notion of equivalence (e.g., see [48]) to check if two codes are convertible to each other.
Two linear codes C and C′ are called equivalent [48] if
C′ can be represented in
terms of C by
a change of basis of the vector
space generated by the message symbols (i.e., a remapping of the message symbols), and
2. 2.
a change of basis
of the column-spaces of the nodal generator matrices (i.e., a remapping of the symbols stored within a node).
3. 3.
scale parameters of (α,β,F) of codes by an integer factor so that both codes have the same parameters.
It turns out that the cascade code generated for the MBR point (μ=1) is equivalent (and even identical) to the one introduced in [11]. However, the codes generated for the MSR point using cascade construction and PM construction are fundamentally different, and not equivalent.
To show this, we focus on MSR codes for a distributed storage system with specific parameters of (n,k,d=2k−2), for some k>2. The parameters of an MSR cascade code can be found from Theorem 1 by setting μ=k−1, as (α,β,F)=((k−1)k,(k−1)k−1,k(k−1)k). On the other hand, the parameters of the PM code are given by (α′,β′,F′)=(k−1,1,k(k−1)). So, in order for a fair comparison, one needs to concatenate N=(k−1)k−1 copies of MSR PM codes with independent message matrices M1′,M2′,…,MN′ to obtain a code with the same parameters as the cascade code. Let C and C′ be the resulting cascade and PM codes for these parameters, respectively. We have
[TABLE]
where Ψ and Ψ′ are the encoder matrices for the two constructions. While the conditions for the encoder matrices Ψ and Ψ′ are in general different, both sets of requirements are satisfied by the choice of
Vandermonde matrices.
Let us focus on the repair data sent by the helper nodes in each code.
We denote by Reph→f and Reph→f′ the vector space spanned by the repair symbols sent by a helper node h to repair a failed node f for the cascade and PM codes, respectively. It is clear that
dim(Reph→f)=β=(k−1)k−1 and dim(Reph→f′)=Nβ′=(k−1)k−1, i.e., both spaces have identical dimensions. However, the intersection of two of such vector spaces has a dimension, which is different for the two code constructions of interest.
This is formally highlighted in the following proposition.
Proposition 6**.**
For an MSR cascade code for an (n,k,d=2k−2) distributed storage system with parameters (α,β,F)=((k−1)k,(k−1)k−1,k(k−1)k), and three distinct nodes h, f, and g, we have
[TABLE]
while for a concatenation of N=(k−1)k−1 independent copies of MSR PM codes we have
Recall that the repair space Reph→f in a cascade code is simply a concatenation of repair data for each code segment, which is an (n,d,d;m) (modified) signed determinant code.
For each code segment, we can use the result of [41] to evaluate the overlap between subspaces spanned by the repair symbols sent to two failed nodes. For a code segment with mode m, the dimension of the overlap between the subspaces spanned by the repair symbols sent from h to f and g is given by 2(m−1d−1)−[(md)+(md−2)], as reported in [41, Theorem 2]. Recall that there are tm code segments of mode m, where tm is evaluated in Section VIII-D. Hence, summing up over all code segments, we get
[TABLE]
Note that we have used Lemma 4
with (a,b)=(−1,−1), (a,b)=(0,0), and (a,b)=(−2,0) in the last equation in
with where in (95).
For the MSR PM code, we note that it is obtained by concatenating Nindependent copies of little PM codes with parameters (α′,β′,F′)=(k−1,1,k(k−1)).
For each little PM code, the dimension of the overlap between the subspaces of interest is an integer number, which can be either [math] or 1. If the latter holds, then the subspace spanned by the repair
symbols sent from h to f and g are identical, i.e., Reph→f′=Reph→g′. By symmetry, this should hold for any other failed node. Now, consider a set of helper nodes H with ∣H∣=d, and a set of failed nodes with F with ∣F∣=k. We can repair the content of all the nodes in F by sending only β=1 symbol from each of the helper nodes in H, since Reph→f′=Reph→g′ for any f,g∈F and any h∈H. Moreover, the entire information of the little PM code should be recoverable for the content of nodes in F, since ∣F∣=k. This implies
[TABLE]
which is in contradiction with k>2. Therefore, for each little MSR PM code we have dim(Reph→f′∩Reph→g′)=0. Summing up over N independent copies, we obtain the claim of the proposition.
∎
An immediate consequence of this proposition is that cascade codes and PM codes are not equivalent, and cannot be converted to each other by any scaling and change of bases.
In general, cascade codes and PM codes are fundamentally different. Some of the main distinctions between the two constructions are highlighted in Table IV.
IX-B The Role of Redundancy in Cascade Codes
The parity symbols in the message matrix of a cascade code play a critical role to guarantee the properties of the code. Such parity symbols were initially introduced for determinant codes [40, 41] in order to facilitate the repair process. However, they play no role in data recovery when d=k. While the redundancy introduced by such parity symbols can affect the overall storage capacity of the system, the lower bounds in [36, 37, 38] show that determinant codes are optimum for for DSS with d=k.
In cascade codes, however, these parity (redundant) symbols play two crucial roles: (1) they help with the repair mechanism, similar to their role in determinant codes, and (2) they make the data recovery possible, in spite of the fact that the data collector only has access to the coded content of k<d nodes. More intuitively, this redundancy is used to provide a backup copy for symbols of a determinant code that could not be retrieved, if the data collector could only observe the content of k<d nodes.
It is easy to verify from the definition of injection process in (18) that all the parity symbols of a child matrix are filled with an injection from the parent code. This suggests that this redundancy is fully exploited, and thus, the proposed code has no further room for improvement. This is the foundation of our conjecture, that is, cascades codes are optimum exact regenerating codes for any set of parameters (n,k,d), and achieve the optimum storage-bandwidth trade-off.
On the other hand, the proposed construction universally achieves the optimum trade-off of any system parameters with a known lower bound: Those are MBR codes (see Corollary 1 and[8]), MSR codes (see Corollary 1 and [8]), an interior operating point on the cut-set bound (see Corollary 2 and [8]), linear codes with k=d [36, 37, 38], and an optimum code for an (n,k,d)=(5,3,4) system, for which a matching lower bound is provided in [49]. These facts altogether support the conjecture regarding the optimality of the codes in general.
IX-C Future Work
The main remaining open problem to be addressed is to provide a lower bound for the trade-off between the storage and repair-bandwidth of exact-repair regenerating codes.
As mentioned above, we conjecture that a tight lower bound will match with the trade-off achieved by cascade codes, indicating that the proposed codes are optimal.
The proposed cascade codes, however, can be improved from several different aspects. One major concern is in regard with the sub-packetization. Even though the sub-packetization of the proposed codes is independent of the number of nodes n, it is exponential in parameter k. An interesting question is a whether an identical trade-off can be achieved using a code with a sub-packetization that is sub-exponential and independent of n. Multiple failures repair is another interesting problem to be studied. More importantly, the problem of dynamic repair [34, 50, 51], referring to the flexibility of varying the number of helper nodes with d∈[k:n−1] (without changing the underlying code) is of both practical and theoretical interest.
Recently, there has been considerable attention to clustered distributed storage systems [52, 53]. A modified version of the proposed construction might be applicable to such clustered systems. Finally, the exact-repair regenerating codes can be viewed in the context of interference alignment problem (see for e.g. [54]), where the repair scenario is equivalent to aligning and canceling the interference (mismatch) between a failed symbol and a coded symbol of a helper node. Therefore, the techniques and results developed in this paper might be also applicable to the design of interference alignment codes for wireless communication.
This triple satisfies (FMBRαMBR,FMBRβMBR)=(k(2d−k+1)2d,k(2d−k+1)2), which is the characteristic of the MBR point [8].
Similarly, for μ=k we have
[TABLE]
where we shifted the summation variable in the evaluation of βMSR, and used the fact that (k+1k)=0
to simplify FMSR. These parameters satisfy (FMSRαMSR,FMSRβMSR)=(k1,k(d−k+1)1), which characterizes the MSR point [8].
∎
The cut-set bound in (1) given by F≤∑i=1kmin(α,(d−i+1)β) reduces to
[TABLE]
for (d−k+1)β≤α≤(d−k)β.
The latter bound is satisfied with equality by the parameters of a cascade code with mode μ=k−1. To show this claim, we use the expressions for α and β in (3), and write
[TABLE]
where we used Pascal’s identity in (98)
Hence, this point satisfies the cut-set bound and it is optimum.
∎
Appendix B Proof of Node Repairability for Singed Determinant Codes
In this section we present the proof of Proposition 3. We start from the RHS of (13) and show it is equal to the LHS.
[TABLE]
In this proof we used the following facts:
•
In (100) we used the definition of Ξf,(m) in (11), where the entry ΞL,I∖{i}f,(m) is non-zero only if L includes I∖{i}. This implies that for non zero ΞL,I∖{i}f,(m), L should satisfy L=(I∖{i})∪{y} for some y∈[d]∖(I∖{i});
•
In (101), we split the summation into two cases: y=i and y=i;
•
In (102), we replaced Ξ(I∖{i})∪{y},I∖{i}f,(m) by (−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(y)+ind(I∖{i})∪{y}(y)ψf,y from its definition in (11);
•
In (103) the two summations over i and y are swapped;
•
In (104), we used the definition of ind⋅(⋅) function to write
[TABLE]
Here, the third equality holds since i∈I and y∈[d]∖I, which implies i=y. The last equality holds because \mathbbm1[i<y]+\mathbbm1[y<i]=1. This leads to indI(i)+ind(I∖{i})∪{y}(y)≡indI∪{y}(y)+indI∪{y}(i)+1 modulo 2.
•
In (106) we used the definition of \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D in (8): since i∈/(I∖{i})∪{y} then \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Di,(I∖{i})∪{y}=(−1)σ\textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,D(i)wi,I∪{y}. A similar argument is used in (108);
•
In (107) we used the parity equation (5). In particular, we have
[TABLE]
which implies
[TABLE]
•
In (109), we notice that the overall sign of each term in the summation is positive.
This leads to (110), which is exactly the LHS of (13). This completes the proof of Proposition 3. □
Remark 9**.**
Note that in the chain of equations above we aim to repair the coded symbol at position I of the failed node, which is a linear combination of symbols in column I of the message matrix. However, the linear combination in (105) misses some of the symbols of column I (i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Di,I when i∈/I) and includes symbols from columns of the message matrix (i.e., \textpdfrenderTextRenderingMode=FillStroke,LineWidth=.3pt,FillColor=white,Di,(I∖{i})∪{y} with y=i). However, these two interference perfectly cancel each other due to the parity equation in (5). This is identical to the notion of interference neutralization, which is well studied in multi-hop wireless networks [55, 56, 57].
Appendix C Semi-Systematic Encoder Matrix
Consider an (n,k,d) regenerating code obtained using an encoder matrix Ψ that satisfies Conditions (E1) and (E2). Here, we show that we can modify the encoder matrix such that the resulting code becomes semi-systematic, that is, the first k nodes store pure symbols from the message matrix. Consider a general encoder matrix
[TABLE]
Recall Condition (E1), that ensures any k rows of Γn×k are linearly independent. Thus, Ak×k is a full-rank and invertible matrix. We can define
[TABLE]
It is easy to verify that X is a full-rank matrix, and its inverse is given by
[TABLE]
Therefore, we can modify the encoder matrix to
[TABLE]
It is easy to verify that Ψ~ satisfy both Conditions (E1) and (E2). To this end, let K be an arbitrary set of row indices with ∣K∣=k. We have
Γ~[K,:]=−Γ[K,:]Ak×k−1 which is a full-rank matrix, since both Γ[K,:] and Ak×k−1 are full-rank. This shows Condition (E1) holds for Ψ~. Similarly, for an arbitrary set H⊆[n] with ∣H∣=d we have
Ψ~[H,:]=Ψ[H,:]X, which is again full-rank, because both Ψ[H,:] are X full-rank. Hence Condition (E2) is also satisfied for Ψ~. The code obtained using the encoder matrix Ψ~ is semi-systematic, since the content of node i is exactly the symbols in the i-th row of the super-message matrix, for i∈[k].
Appendix D Z-Transform for Evaluation of Code Parameters
D-A An Overview of Z-Transform
We will use the Z-transform to solve the recursive equation in (77) for tm’s, and evaluate the code parameters in (71), (72), and (75). For the sake of completeness, we start with the definition and some of the main properties of this transformation. We refer the reader to [58, 59] for the details and proofs of the properties listed below.
Definition 9**.**
The two-sided Z-transform of a sequences101010With slightly abuse of notation, we use xm to refer to the sequence {xm}m=−∞∞ as well as the m-th element of this sequence. xm is defined as
[TABLE]
where z is a complex number. The region of convergence (ROC) of X(z) is defined as the set of all points in the complex plane (z∈C) for which X(z) converges, that is,
[TABLE]
Definition 10**.**
The inverse Z-transform of X(z) is defined as a sequence {xm}m=−∞∞ where
[TABLE]
where C is a counterclockwise closed path encircling the origin and entirely located in the region of convergence, ROCx.
For a given ROC, there is a one-to-one correspondence between the sequences xm and its Z-transform, X(z). Some properties of the Z-transform as well as some pairs of sequences111111Recall that, in this paper, we defined (mℓ) to be zero for m<0 and m>ℓ. and their Z-transforms are listed in Table V and Table VI, respectively.
We start from the definition of pm and use (77) to obtain a recursive equation. For any m with m=0, we have
[TABLE]
where (114) is implied by (77), and in (115) we used a change of variable ℓ=j−μ+m.
Note that pm can be also written as
pm=−pm−0⋅(0−1)(0d−k+1), which is of the form of the summands in (115). Hence, by including ℓ=0 in the summation, we get
Next, define a sequence qℓ=(ℓ−1)(ℓd−k+1) for every integer ℓ.
Then (119) can be rewritten as
[TABLE]
where (120) holds since qℓ=0 for ℓ<0, and pm−ℓ=tμ+(ℓ−m)=0 is zero for ℓ>m (see definition of tm in (80)). Here, the operator ∗ in (121) denotes the convolution between sequences pm and qm.
We can take the Z-transform from both sides of (121). Denoting the Z-transforms of pm and qm by P(z) and Q(z), respectively, and using Table V and Table VI, we can write
[TABLE]
The Z-transform of qm can be easily found using Table V and Table VI as follows.
[TABLE]
where (123) holds due to linearity of the Z-transform, in (124) we used the differentiation effect, and in (125) we used the fourth pair in Table VI with a=1 and b=d−k+1 .
Plugging (126) into (122), we get
[TABLE]
with the region of convergence ROCp={z:∣z∣>∣d−k∣}. It remains to find pm from P(z) by computing its inverse Z-transform. We have
[TABLE]
where in (128) we used the generalized accumulation rule in Table V for a=d−k. It is worth mentioning that the inverse Z-transform of (1+z−11)d−k should be taken with respect to variable t. To this end, in (129) we have used the third pair in Table VI with a=−1 and b=d−k. Finally, in (130) we have limited the range of t by noticing the fact that the binomial coefficient is zero for t<0.
This shows the desired identity and completes the proof. □
for every integer μ.
The claim of this lemma is equivalent to uμ=vμ for all μ∈Z. Instead of directly showing in the μ-domain, we will prove that the two sequences are identical in the z-domain, and have the same ROCs. We start with sequence {uμ} and write
[TABLE]
where in (131) and (132) we used the convolution and time-shift properties from Table V, respectively. Moreover, in (133) we have used (127) and Table VI to evaluate the Z-transforms. Note that ROCu=ROCp={z:∣z∣>∣d−k∣}.
Similarly, for sequence {vμ} we have
[TABLE]
where in (135) and (137) we used time-shift property and generalized accumulation property from Table V, respectively.
Moreover, (136) holds because (mk+a) is zero for m<0, and we used Table VI to evaluate the Z-transform in (138). It is worth noting that the ROC of V(z) is given by ROCv={z:∣z∣>∣d−k∣}, due to the step in (137). Comparing (133) and (138) we find that U(z)=V(z). Since the two functions in z-domain have identical ROCs, their corresponding sequences {uμ} and {vμ} should be also identical. This completes the proof of the lemma. □
Bibliography59
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] M. Elyasi and S. Mohajer, “A cascade code construction for (n, k, d) distributed storage systems,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1241–1245.
2[2] S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file system . ACM, 2003, vol. 37, no. 5.
3[3] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, “Xoring elephants: Novel erasure codes for big data,” in Proceedings of the VLDB Endowment , vol. 6, no. 5. VLDB Endowment, 2013, pp. 325–336.
4[4] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, S. Yekhanin et al. , “Erasure coding in windows azure storage.” in Usenix annual technical conference . Boston, MA, 2012, pp. 15–26.
5[5] F. Dabek, J. Li, E. Sit, J. Robertson, M. F. Kaashoek, and R. Morris, “Designing a DHT for low latency and high throughput.” in NSDI , vol. 4, 2004, pp. 85–98.
6[6] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz, “Maintenance-free global data storage,” IEEE internet computing , no. 5, pp. 40–49, 2001.
7[7] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. M. Voelker, “Total recall: System support for automated availability management.” in Nsdi , vol. 4, 2004, pp. 25–25.
8[8] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theory , vol. 56, no. 9, pp. 4539–4551, 2010.