Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes
Abhishek Agarwal, Alexander Barg, Sihuang Hu, Arya Mazumdar, Itzhak, Tamo

TL;DR
This paper introduces new combinatorial and linear programming bounds for locally recoverable codes, improving the understanding of their rate-distance trade-offs especially over small alphabets.
Contribution
It presents novel combinatorial bounds and an LP-based approach that yield tighter estimates on the rate of LRC codes with specified distance.
Findings
New combinatorial bounds including sphere packing and Plotkin bounds.
An LP bound that outperforms existing bounds in examples.
The tightest known upper bound on the rate of linear LRC codes with given distance.
Abstract
Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds.
| r | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|
| SH | 3 | 4 | 6 | 8 | 10 | 11 | 13 | 15 | 17 |
| LP | 2 | 4 | 5 | 7 | 9 | 11 | 12 | 14 | 16 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes
Abhishek Agarwal1 1College of Information and Computer Sciences, University of Massachusetts–Amherst, Amherst, MA 01003. Emails: {abhiag,arya}@cs.umass.edu. Research supported by NSF grants CCF 1642658, CCF 1318093 and CCF 1618512.
Alexander Barg2 2Dept. of ECE and ISR, University of Maryland, College Park, MD 20742 and IITP, Russian Academy of Sciences, 127051 Moscow, Russia Email: [email protected]. Research supported in part by NSF grants CCF 1422955 and CCF 1618603.
Sihuang Hu3 3 Lehrstuhl D für Mathematik, RWTH Aachen, Germany. Email: [email protected]. This work was done while this author was a postdoc at Department of Electrical Engineering - Systems, Tel Aviv University, Israel. Research supported by ERC grant no. 639573, ISF grant no. 1367/14, and the Alexander von Humboldt Foundation.
Arya Mazumdar1
Itzhak Tamo4 4 Department of Electrical Engineering - Systems, Tel Aviv University, Israel. Email: [email protected]. Research supported by ISF grant no. 1030/15 and NSF-BSF grant no. 2015814.
Abstract
Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds.
I Introduction
††The authors’ names appear in alphabetical order.††This paper was presented in part at 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, July 2016, and at 54th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, September 2016.
We consider codes over a finite alphabet that have the usual property of error correction and the additional property of being able to recover one or more erased symbols of the codeword by accessing only a small number of other symbols. Codes of this kind are said to be locally recoverable (LRC), and they have applications in large-scale distributed storage systems. LRC codes were first defined in [1] and were studied in a number of subsequent papers in recent years.
A -ary code of length cardinality and distance is a set of vectors over an alphabet with minimum pairwise Hamming distance . The quantity is called the dimension of If is a finite field and is a linear subspace of then is the dimension of as a vector space. Below, , and for any , is the projection of in the th coordinate. By extension, for any , is the projection of onto the coordinates of .
Definition 1
A code is locally recoverable with locality if every coordinate is contained in a subset of size such that for every codeword there is a function with the property that
[TABLE]
where are the elements of We use the notation to refer to a code of length , dimension and locality
The definition of LRC codes was extended in several different ways. The following generalization is important for our purposes.
Definition 2
A code of cardinality is said to have the locality property (to be an LRC code) where , if each coordinate is contained in a subset of size at most such that the restriction of the code to the coordinates in forms a code of distance at least . Notice that the values of any coordinates of are determined by the values of the remaining coordinates, thus enabling local recovery.
This definition was first proposed in [2, 3] with the less demanding restriction of protecting only the information symbols of the codeword (see also [4] for a related but different notion). In the above definition we consider all-symbol locality, without differentiating between the information and parity symbols. The set is called the repair group, and the set is called the recovery set for the coordinate .
Other extensions of the concept of LRC codes include codes with multiple disjoint repair groups for every coordinate, also called codes with the availability property [5], codes with sequential repair of several erasures [6], codes with cooperative repair [7], local repair on graphs [8], as well as other variations.
Problems of constructing LRC codes and bounding their parameters have been the subject of a considerable number of publications. Constructions of LRC codes obtained by combining some known code families without the locality property were suggested in [9, 10, 11]. A family of codes extending the construction of Reed-Solomon codes to codes with locality was proposed in [12] and further generalized to codes on algebraic curves in [13]. We refer to [14] for a survey of some aspects of the algebraic theory of LRC codes.
Research on bounds for LRC codes was initiated in [1] which showed that the distance of an LRC code is bounded as follows:
[TABLE]
In [15] this bound was extended to the case of arbitrary . Namely, the distance of an LRC code satisfies the inequality
[TABLE]
Bounds for codes with availability were established in [5, 16, 17].
Note that the bounds (2), (3) do not depend on the size of the code alphabet . A bound that accounts for the value of was derived in [18]. It has the following form: For any -ary LRC code with the parameters and distance
[TABLE]
where is the maximum cardinality of a -ary length code with distance . This bound can be used to derive asymptotic upper bounds on the rate of LRC codes with a given value of the distance (more on this below). Asymptotic lower bounds (achievability results) on the rate of LRC codes, namely Gilbert-Varshamov (GV) type asymptotic bounds, were also derived independently in [17, 18]; in particular the former work derives a bound for the case of availability as well.
In this paper we focus on combinatorial upper bounds on the parameters of LRC codes, tightening prior results, and emphasizing the dependence between the parameters and the size of the code alphabet . We explore several general approaches to the derivation of the upper bounds, including recursive bounds, the linear programming approach, and the approach relying on the coset leader graph of the code.
Linear programming (LP) is a powerful technique that accounts for some of the best known upper bounds on the size of codes with a given distance. It was pioneered in [19] and used in [20] to derive the best currently known asymptotic upper bound on error correcting codes. These results rely on the approach to codes via association schemes and their eigenvalues, combined with some analytic techniques. Incorporating the locality constraints into the LP problem in a way that yields closed-form bounds is a nontrivial problem. We suggest a way to address it under the additional assumption that i.e., that different repair groups are disjoint, and the set of coordinates is a disjoint union of the repair groups. With this assumption, an association scheme that fits the locality constraints forms a Delsarte extension of the usual Hamming scheme. Relying on this observation, we derive an LP bound on LRC codes in a polynomial form and construct a polynomial that gives rise to a Singleton-like bound on such codes. We also compute numerical examples for which corresponds to the original definition of LRC codes, and show that the LP bound is sometimes better than the only other known alphabet-dependent bound (4).
We note that LP bounds on linear LRC codes were earlier studied in [21] which considered a standard LP problem [19] with the additional constraint that every coordinate is contained in a codeword of the dual code of weight At the same time [21] gave no closed-form solutions of the LP problem or any numerical examples. LP bounds for LRC codes with multiple repair groups were considered in [22] and LP bounds for cyclic LRC codes were considered in [23].
Finally, we study asymptotic upper bounds on linear LRC codes that satisfy Definition 1. The starting point of our study is an observation that a linear LRC code necessarily contains several low-weight parity checks. Another class of codes that has the same property is low-density parity check (LDPC) codes. A recent work [24] derived new improved asymptotic bounds on the rate of LDPC codes by analyzing the coset graph of the code. While LDPC codes by definition contain only low-weight parity checks, LRC codes combine such checks with a large number of unrestricted parity check equations. Nevertheless, it is possible to combine the approach of [24] with the recursive bound (4) to obtain an asymptotic bound on linear LRC codes that is better than the asymptotic bound obtained from (4). An even better bound can be obtained for linear LRC codes with disjoint repair groups.
The paper is organized as follows. In Section II we derive a general upper bound on the size of LRC codes that reduces the problem to bounds on codes with a given distance but without locality constraints. This result is conceptually similar to the bound (4) (from [18]) but relies on a different kind of recursion, and this reduction enables us to use known bounds on codes to derive new results for LRC codes. In conjunction with the asymptotic Gilbert-Varshamov (GV) bound on LRC codes of [13], this yield an exact value of the asymptotic code rate when This result, proved earlier for in [18, 17], is extended here to any In Section III, we derive Delsarte’s linear programming (LP) bounds for LRC codes with disjoint repair groups. The results include a Singleton bound for LRC codes. For the special case of usual LRC codes (Definition 1) with disjoint repair groups, our results improve the shortening bound of (4). In Section IV, we consider linear LRC codes, and by using a theory of coset-leader graphs combined with the approach of [18], are able to provide better asymptotic bounds on the rate-relative distance trade-off of LRC codes. The bound becomes stronger if we consider disjoint repair groups.
This paper is a result of merging and developing the papers by S. Hu, I. Tamo, and A. Barg [25] and by A. Agarwal and A. Mazumdar [26], both devoted to the problem of deriving alphabet-dependent bounds on LRC codes.
II New Bounds on LRC Codes
In the next theorem we introduce a method of using upper bounds on codes with a given distance (without the locality property) to derive upper bounds on LRC codes. Let be an upper bound on the cardinality of a code of length and distance , which is a log-convex function111A positive function of the integer argument is called log-convex if for any in the support of . of and such that
Theorem 1
Let be an -ary LRC code with distance , and let
[TABLE]
where . Then, for any , we have
[TABLE]
Proof:
We begin with constructing a sequence of nonempty disjoint subsets whose union is of size at least Starting with assume that the sets are already constructed. If terminate the procedure. Otherwise let be an arbitrary element in (w.l.o.g. we can assume that ) and define
[TABLE]
Suppose that this procedure terminates after steps, and let be the sequence of subsets constructed above. For let and denote by the restriction of to the coordinates in . Note that by the construction, we have
[TABLE]
and
[TABLE]
Let us prove by induction on that
[TABLE]
For , and by definition, the code has distance at least Therefore, . Now assume that (8) holds for Let be an arbitrary codeword of , and let be the set of codewords in whose restriction to equals . These codewords can be different only in the coordinates in , and therefore the restriction of to the coordinates in forms a code of distance at least This implies that , and so This completes the induction step.
Since the code has distance and , it follows that
[TABLE]
Suppose that are such that then using log-convexity, we obtain
[TABLE]
This step can be repeated times till either the larger subset is of the maximum possible size or the smaller one becomes empty (in which case we put ). Use this argument in (9) and successively reduce the number of factors on the right-hand side as many times as possible. On account of (7) we will obtain at most factors, and in each of them the size of the coordinate subset will be or less. We conclude that
[TABLE]
Now (6) follows by combining (9) and (10) and taking logarithms on both sides of the resulting equation. ∎
Theorem 1 provides a general upper bound on the size of LRC codes. Explicit results are obtained once we substitute a log-convex upper bound . Fortunately, many known bounds on codes are in fact log-convex. For instance, let us prove that this is the case for the Hamming (sphere-packing) bound.
Lemma 2
The function is log-convex in .
Proof:
Let and let . In all the expressions below does not change, so to simplify the notation we write instead of . For any we have
[TABLE]
We find
[TABLE]
It is straightforward to check that with each term inside the brackets on the last line is nonnegative, and so
[TABLE]
∎
Similar (but simpler) checks can be performed to verify the log-convexity of the Plotkin and Singleton bounds, and we obtain the following corollary.
Corollary 3
Let be an -ary LRC code with distance , and let be defined in (5). The following bounds hold true:
Locality-dependent Hamming bound:
[TABLE] 2. 2.
Locality-dependent Plotkin bound: Let then
[TABLE] 3. 3.
Locality-dependent Singleton bound:
[TABLE]
The bound (13) is slightly weaker than the Singleton-type bounds in (2) and (3).
Remark 1
Not all bounds on codes are log-convex in the code length. For example, let be the maximum size of a binary code of length and distance We have and so violating the log-convexity condition (which stipulates that the geometric average be greater than the “middle value”).
Let
[TABLE]
where is the maximum cardinality of the LRC code with minimum distance . We finish this section by showing that the bound (13) can be combined with a known result to derive an exact value of for The following lower asymptotic bound for LRC codes was obtained in [13].
Theorem 4** ([13])**
Assume that there exists a -ary MDS code of length and distance Then the following Gilbert-Varshamov type bound holds true:
[TABLE]
where
[TABLE]
This result implies the following corollary, which for was already established in [17].
Corollary 5
Assume that there exists a -ary MDS code of length and distance then
Proof:
The bound (13) implies the estimate while (14) gives the opposite inequality. ∎
III Algebraic Combinatorics of LRC Codes and LP Bounds
Delsarte’s linear programming bound is a powerful method of estimating the size of optimal codes in various metric spaces that satisfy a set of general assumptions [19]. In this section we develop an adaptation of the approach in [19] to LRC codes.
III-A Association Schemes and their Powers
III-A1 Metric Association Schemes
We begin with a brief reminder about metric association schemes [19, 27]. Let be a finite metric space with distance function , and let be a partition of such that for all . The pair is called an association scheme if the intersection volume of two balls in depends only on the distance between their centers and the radii of the balls. For each denote by the adjacency matrix of , where if and [math] otherwise. The matrices span a complex semisimple algebra of dimension , called the Bose-Mesner algebra of the scheme. Since each is symmetric, this algebra is commutative. It affords a dual basis of minimal idempotents We can represent the matrix as a linear combination of the idempotents. The coefficients of this expansion form the first eigenmatrix of the scheme , denoted by . A similar transition can be performed in the other direction, and the corresponding coefficients form the second eigenmatrix of denoted by . Namely, we have
[TABLE]
III-A2 Products of Association Schemes
Let be a metric association scheme with eigenmatrices and , and let be a Cartesian power of . We can define a product association scheme by introducing the relations on in the following obvious way [19, p.17]:
[TABLE]
The adjacency matrices of are formed of the Kronecker products where is an adjacency matrix of the th copy of in the product. It is not hard to check that the first (second) eigenmatrix of the scheme equals the th Kronecker power of (resp., of ).
III-A3 The Linear Programming Bound
Let be an association scheme with classes and let be a code (any subset ). The distance distribution of is given by where is the average number of codewords at distance from a given codeword of . Clearly, and . The vector called the MacWilliams transform of the distance distribution of , satisfies the Delsarte inequalities . This gives rise to Delsarte’s linear programming bound on codes: let be a code with distance , then
[TABLE]
(e.g., [19, Ch.2,3], [28, Ch.17]). This bound also applies to product schemes. Indeed, let and let be a code. Let where and for all be the distance distribution of . Similarly, and . Suppose that if where is some subset of . Then we have
[TABLE]
where is the second eigenmatrix of More details about product schemes are given in [19, Sec. 2.5] as well as in more recent works [29, 30].
III-A4 The Hamming Scheme
The following classic example will be useful below in the context of LRC codes. Let be a set of cardinality and let be a Cartesian power of . We specialize the definition of the metric scheme by assuming that is a Hamming metric on . Namely, let where is the Hamming distance. We obtain a symmetric association scheme with classes, denoted by . The eigenvalues of are given by [19], where
[TABLE]
is the Krawtchouk polynomial. Also we have .
The Hamming scheme also carries the structure of a product scheme for . Consider the Hamming scheme as being obtained from the product of and by merging all relations with into one relation . We have and for any pair with Also we can view all three association schemes involved as merged versions of powers of .
Clearly, and similarly for any We conclude that the eigenvalues of the scheme have the form
[TABLE]
where the multi-indices are the indices of the relations of the scheme. As is the case with the original Hamming scheme, the obtained scheme is also self-dual. It is this setting that we apply to the analysis of LRC codes in the next section.
III-B The Linear Programming Bound for LRC Codes with Disjoint Repair Groups
III-B1 General Bound
We begin with stating a general LP bound for codes with locality. Let be an -LRC code with minimum distance . Suppose that where . For , define the interval
[TABLE]
so the coordinate set is a disjoint union of these intervals:
[TABLE]
For denote by the projection of on the coordinates in . Throughout this section we will assume that the code has the property that
[TABLE]
In accordance with (16), define the following polynomials of discrete variables
[TABLE]
where The polynomials are orthogonal on the set :
[TABLE]
where
[TABLE]
and is the Kronecker delta function.
Let be the distance distribution of , where each is an -tuple. Here is the number of pairs of codewords such that the Hamming distance normalized by the cardinality of the code Note that the codewords are contained in the code By definition, we have , and if where
Now it is direct to check that the general bound of (15) in our case takes the following form.
Theorem 6** (Primal LP bound)**
Let be a -ary LRC code with distance . Define
[TABLE]
Then the cardinality of satisfies where the vector is a solution of the following LP problem
[TABLE]
The dual problem of the LP problem in Theorem 6 has the following form.
Theorem 7** (Dual LP bound)**
Let and be as defined in Theorem 6. The cardinality of satisfies
[TABLE]
As in the classical case (cf. [19, p.53],[28]), instead of solving this LP problem, we construct feasible solutions which provide upper bounds for the minimum. We state the result in polynomial form, which is obvious from (18)-(20).
Corollary 8
Let be a -ary LRC code with distance . Let be a polynomial whose Krawtchouk expansion has the form
[TABLE]
where the coefficients satisfy (i) for , and (ii) for Then **
III-B2 The Singleton Bound
The bounds in Corollary 3 can be proved using the polynomial approach of Corollary 8. To exemplify this claim, we give another proof of the Singleton bound. The original form of this bound in [2] is as follows:
[TABLE]
Recall that . Assume that for some , . Relaxing (21) by omitting the ceiling function, we obtain
[TABLE]
Recall that in the classical case the Singleton bound is proved using the polynomial [19, p. 54], [28, p. 544]
[TABLE]
(the “annihilator” of the weight distribution). Following this approach, define the polynomial in the form
[TABLE]
We will prove that the polynomial is a feasible solution of the dual LP problem, i.e., that it satisfies the conditions in (19), (20). Consider the expansion
[TABLE]
On account of (23) we conclude that for all , so (19) is indeed true.
To prove (20), we will show that for . First suppose that . Choose any , then . If , then we must have , implying that . If , then there must exist some nonzero , which implies that and again The case can be analyzed using similar arguments.
Therefore, by Theorem 7 we obtain the bound
[TABLE]
In other words,
[TABLE]
This estimate is an LP version of the Singleton bound, and it is slightly better than (22).
III-B3 Bounds for LRC Codes
It is interesting to apply the LP approach to bounds on LRC codes for , i.e., the case of single-symbol locality (Definition 1). The known bounds that apply in this case include the Singleton bound (13), which does not depend on , and a shortening bound of [18]. For the ease of reading we reproduce the bound (4): For any LRC code with distance , we have
[TABLE]
*where is the maximum cardinality of a -ary code of length and distance *
We computed this bound and the bound of Theorem 7, using in the definition of the index set . The results are summarized in Tables I–IV. Note that the corresponding length of the code is , and the entry of the table is the upper bound on the dimension . We perform the computations using the GAP package GUAVA and the package GLPK in the symbolic computations system SageMath [31]. Each result was verified using the package COIN-OR, also available in SageMath.
In all the above examples, the LP bound either matches the shortening bound or is tighter than it.
IV Asymptotic Bounds for Binary Linear LRC Codes
In this section we study asymptotic bounds for binary linear LRC codes that can locally correct one erasure. Throughout the section we assume that the code satisfies Definition 1. For define the functions
[TABLE]
where (respectively, ) is the maximum cardinality of a code (respectively, of a linear code) of length , distance and locality . Clearly, The best currently known asymptotic bounds on binary LRC codes are described in the following theorem.
Theorem 9** ([18, 17])**
We have
[TABLE]
[TABLE]
where is any asymptotic upper bound on the rate of codes with relative distance . These bounds imply that
[TABLE]
The lower bound (25) is of the Gilbert-Varshamov type and was derived in [18] and [17], while the bound (26) is obtained from (4) by passing to the limit of large block length (see [18]). To obtain the tightest possible bound in (26) we substitute the best known bound on , i.e., the McEliece et al. bound [20]:
[TABLE]
where and is the binary entropy function.
Remark 2
Even though in Sec. III-B3 we showed by example that the LP bound is better than the bound (4) for finite length, it is difficult to derive a closed-form asymptotic version of the LP bound. The problem occurs because to derive the asymptotic version of the LP bound it would be easier to have a small number of local codes whose distance grows in proportion to . In reality we have to deal with a growing number of local codes with distance .
In this part we prove a new bound on linear LRC codes which which improves upon the (linear case of the) bound (26). We begin with a remark that for linear LRC codes the recovery functions defined in (1) are also linear.
Lemma 10
Let be a linear LRC code of length , dimension , and locality over a field Then for every coordinate the recovery function defined in (1) is also linear.
Proof:
Choose a generator matrix of and let be the set of columns of . Given , the coordinates of the codeword are equal to , where and is the dot product. Since has locality , the coordinate must be a function of the coordinates in its recovery set . Let be the corresponding subset of columns of . Without loss of generality we may assume that the
If then there exists a linear recovery function for the th coordinate, so let us assume the opposite (thus, ). Let be such that and Further, let be an arbitrary vector and consider the codewords and Clearly, their entries in the coordinates of are the same, so the LRC property implies that their th coordinates are equal as well. This however is clearly not the case, so we obtain a contradiction. ∎
The main result of this subsection is given in the following theorem.
Theorem 11** (Linear LRC codes)**
The maximum rate of a linear LRC code satisfies the following inequality.
[TABLE]
where
[TABLE]
Bound (28) improves upon the bound in (26) for all values of relative distance, however the improvement is rather mild, and can barely be seen in a plot.
The proof of this theorem consists of two steps. First we observe that the approach of [24] applies to LRC codes, yielding a bound on their rate. In the second step we combine this approach with a recursive shortening bound approach of [18], obtaining Theorem 13 below.
To proceed with the first step, let us quote the main technical lemma of [24].
Lemma 12** ([24])**
Consider a sequence of LDPC codes with increasing length and parities of weight at most . Suppose that the distance of codes converges to the value as Then the maximum achievable rate of the codes is bounded above by given in (29).
Below in Sec. IV-A we provide some details of the argument of [24] because it is be needed for our second result in this section, namely a bound on linear LRC codes with disjoint repair groups. In particular, it will be clear that Lemma 12 also applies to linear LRC codes since the conditions required for it to hold are actually weaker than that the assumptions in the proof in [24]. The weaker set of assumptions used below is as follows:
[TABLE]
For linear LRC codes, these conditions are satisfied for . The statement in [24] in addition to (31) assumes that form a basis for the code . It is true that this condition holds for LDPC codes, but this is an artifact of the LDPC setting rather than an essential element of the proof that we give in Sec. IV-A. From our discussion below (see the remarks after Eq. (37)) it will become clear that conditions (31) suffice to complete the argument. As a consequence, we obtain the following bound on the rate of linear LRC codes:
[TABLE]
This bound does not improve on (26), but it is possible to establish a recursion that will lead to an improvement. Namely, we combine (32) with code shortening to obtain (28). The following statement is a minor modification of the result of [18].
Theorem 13
Let be a binary LRC code of length , locality and distance , then the dimension of , , satisfies,
[TABLE]
where is the maximum cardinality of a linear LRC code222The statement in [18] does not include the LRC condition of the shortened code. of length , distance , and locality . Therefore,
[TABLE]
Proof:
Eq. (34) is an obvious consequence of (33), so let us prove (33). The proof in [18] relies on the fact that for any there exists a subset of coordinates such that ([18, Lemma 1]). That such a subset exists can be shown relying on the locality property of the code Let . We shorten the code to obtain a code of length dimension at least and distance ([18, Lemma 2]). This code is obtained by taking all the codewords that contain zeros in the coordinates in and discarding these coordinates.
The only added element in our claim is that the shortened code itself is LRC. Indeed, let Referring to Def. 1, we need to prove that for any coordinate there exists a function that depends on at most other coordinates and computes the value of the th coordinate of the codeword There are two cases:
(i) The repair group of does not intersect the subset . In this case there is nothing to prove.
(ii) Some number of the coordinates of are inside the subset . Let In this case the value of the discarded coordinates for every codeword of is equal to [math]. Suppose that is the recovery function of the original code We claim that the recovery group of the coordinate in the code is the subset , and the recovery function is obtained from by substituting zeros for all the arguments in Note that the function essentially depends on coordinates of the codeword , conforming with the locality requirement. ∎
Remark 3
While we need this result only for linear codes, the claims of Theorem 13 are still valid if we omit the linearity assumption (with obvious modifications to the statement).
Now Theorem 11 follows immediately by using (32) in the estimate (34).
In the next subsection we give a sketch of the approach in [24] for LDPC codes. In Sec. IV-B, we improve on Theorem 11 for the case of disjoint repair groups.
IV-A The Approach of Iceland and Samorodnitsky [24] to Bounds on LDPC Codes
Coset graphs of linear codes have been often used as a tool to study combinatorial properties of codes and to obtain bounds on their parameters [32, 33, 34]. Given a linear code define a graph , where , i.e., the vertices of correspond to the cosets of the code, and two cosets are connected by an edge if the Hamming distance between them is one.
Let be a linear code. Throughout this section we use the coset graph of the dual code so all the references to the coset graph below are with respect to The length of the shortest path between a pair of vertices equals the Hamming distance between the corresponding cosets. Given a vertex denote by the ball of radius around it in the graph. Since the graph is vertex-transitive, the volume of the ball does not depend on , and we will use the notation where is an arbitrary vertex. Clearly, equals the number of cosets whose leaders are of weight at most
The starting point of the argument in [24] is the following result from [34].
Theorem 14** ([34])**
Consider a linear codes of length and let be the number of cosets of weight at most in Then
[TABLE]
where is defined in (30) and denotes the relative distance of the code .
Using the obvious estimate one obtains a bound valid for any code The main idea in [24] is that it is possible to obtain a tighter estimate for in the case when is an LDPC code, leading to an improved bound on the rate of such codes compared to the universal bounds of [20].
Proposition 15** ([24])**
Let be a random vector with independent Bernoulli coordinates such that and let be the probability that is a coset leader of Then
[TABLE]
Proof:
We include a very short proof. Limiting ourselves to the vectors with at most ones, we have
[TABLE]
∎
The next step, which is the main technical ingredient of the result in [24], is to show that is an exponentially declining function of . Let us assume that the dual code contains a set vectors such that for all and that .
Construct a partition of the coordinate set into disjoint sets as follows. Suppose that are already defined. Set . For each of the vectors consider the set of coordinates , and if the size of is exactly put
It is easy to observe that each block in the partition is a disjoint union of subsets such that for all and each is contained in the support of a different vector Moreover, the set is a subset of An important property of the partition is as follows.
Lemma 16** ([24, Lemma 2])**
Let There exists an index such that
[TABLE]
Now let be the index whose existence is guaranteed by this lemma. We have |I_{k}|\geq\max\big{\{}A\sum_{j>k}|I_{j}|,\frac{n}{2wA^{w}}\big{\}} and the set is a disjoint union of sets Now consider a coset leader . An easy argument shows that contains no more than subsets Indeed, let and consider a vector from the same coset given by
[TABLE]
If then we would obtain a contradiction.
The final step is to take a random vector as in Proposition 36 and to estimate the probability that it is a coset leader of the code First observe that for given by Lemma 16 we have
[TABLE]
Now let then the Chernoff bound gives
[TABLE]
where is defined in Theorem 11.
Before finishing the proof of Theorem 11, observe that the requirement for to form a basis of was not used in the above argument, including the omitted proof of Lemma 16 (this proof, due to [24], applies here verbatim). The only assumptions used are those listed in (31)(a)-(b), namely that there exists a set of low-weight dual codewords whose supports jointly cover all the coordinates. The last assumption is used in the construction of the partition in Lemma 16 and in (37).
This enables us to use inequalities (36), and (38) in (35) (taking and noting that for all ). This substitution proves Lemma 12, and the estimate (28) in Theorem 11 follows immediately upon taking
IV-B Upper Bound on LRC Codes with Disjoint Repair Groups
Upper bounds for LRC codes with disjoint repair groups were already considered in Sec. III. Here we consider the asymptotic version of this problem, noting that the general result of Theorem 11 in this case admits a significant improvement.
Assume that and consider a binary linear LRC code of length , minimum distance , and locality with disjoint repair groups For every the repair group corresponds to a vector of weight in , and these vectors have pairwise disjoint supports and trivially satisfy the conditions in (31) (even if some repair groups are of smaller size, we can add redundant coordinates to them to bring their size to ).
Recall that is the set of coset leaders of of weight at most and let The vector satisfies the following constraints:
[TABLE]
Eqns. (39a)-(39c) can be used to derive an upper bound on which is given in the following theorem.
Theorem 17** (Linear LRC codes with disjoint repair groups)**
Let be the largest possible asymptotic rate of linear LRC codes with disjoint repair groups. Let . We have
[TABLE]
where
[TABLE]
where if \sum_{i=0}^{t}i\frac{\beta_{i}(x)}{\sum_{\ell}\beta_{\ell}(x)}\Big{|}_{x=1}\leq(r+1)\tau and
[TABLE]
otherwise. Here denotes the unique positive zero of the polynomial
Proof:
We again rely on inequality (35). To bound above the let us the constraints in (39). In particular, from (39a) and (39b) we obtain
[TABLE]
The number of vectors that satisfy (39a)-(39b) is therefore given by the coefficient of in the following expression:
[TABLE]
Now let us in addition use (39c). Since for all and they are in the same coset, it suffices to count only one of them on the right-hand side of (43). Therefore if is even, we obtain
[TABLE]
where
[TABLE]
One can see that asymptotically for the dominating term in this expression is given by the largest coefficient, i.e.,
[TABLE]
Thus we have
[TABLE]
which translates into the following convex maximization problem:
[TABLE]
where the distribution satisfies the constraints
[TABLE]
The maximum is found by differentiation (setting up a Lagrange function) and we obtain
[TABLE]
for and as defined in (41),(42). Substituting these values of we find the bound in the form given in (40). ∎
A numerical evaluation of the improvements in the asymptotic rate of this results over the previous existing results is shown in Fig. 1. The improvement can be seen for larger values of the relative distance. For instance, for the improvement is obtained for and this range increases for larger values of .
Acknoledgment. We are grateful to the reviewers for insightful remarks that helped us to improve the presentation of the paper. In particular, Lemma 10 was suggested by a reviewer.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” IEEE Trans. Inform. Theory , vol. 58, no. 11, pp. 6925–6934, 2011.
- 2[2] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Optimal linear codes with a local-error-correction property,” in Proc. 2012 IEEE Internat. Sympos. Inform. Theory , 2012, pp. 2776–2780.
- 3[3] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with local regeneration,” in Proc. IEEE Int. Symp. Inform. Theory, Istanbul, Turkey, Jul. 2013 , pp. 1606–1610.
- 4[4] L. Pamies-Juarez, H. D. L. Hollmann, and F. E. Oggier, “Locally repairable codes with multiple repair alternatives,” in Proc. 2013 IEEE Int. Sympos. Inform. Theory , pp. 892–896.
- 5[5] A. Wang and Z. Zhang, “Repair locality with multiple erasure tolerance,” IEEE Trans. Inform. Theory , vol. 60, no. 11, pp. 6979–6987, Nov 2014.
- 6[6] N. Prakash, V. Lalitha, and P. Kumar, “Codes with locality for two erasures,” in Proc. 2014 IEEE Int. Sympos. Inform. Theory, Honolulu, HI , pp. 1962–1966.
- 7[7] A. S. Rawat, A. Mazumdar, and S. Vishwanath, “Cooperative local repair in distributed storage,” EURASIP Journal on Advances in Signal Processing , 2015, 17pp.
- 8[8] A. Mazumdar, “Storage capacity of repairable networks,” IEEE Transactions on Information Theory , vol. 61, no. 11, pp. 5810–5821, 2015.
