Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Abhishek Agarwal; Alexander Barg; Sihuang Hu; Arya Mazumdar; Itzhak; Tamo

arXiv:1702.02685·cs.IT·May 16, 2018

Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Abhishek Agarwal, Alexander Barg, Sihuang Hu, Arya Mazumdar, Itzhak, Tamo

PDF

TL;DR

This paper introduces new combinatorial and linear programming bounds for locally recoverable codes, improving the understanding of their rate-distance trade-offs especially over small alphabets.

Contribution

It presents novel combinatorial bounds and an LP-based approach that yield tighter estimates on the rate of LRC codes with specified distance.

Findings

01

New combinatorial bounds including sphere packing and Plotkin bounds.

02

An LP bound that outperforms existing bounds in examples.

03

The tightest known upper bound on the rate of linear LRC codes with given distance.

Abstract

Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds.

Tables1

Table 1. TABLE I: q = 2 , d = 3 , s = 2 formulae-sequence 𝑞 2 formulae-sequence 𝑑 3 𝑠 2 q=2,d=3,s=2

r	2	3	4	5	6	7	8	9	10
SH	3	4	6	8	10	11	13	15	17
LP	2	4	5	7	9	11	12	14	16

Equations178

c_{i} = ϕ_{i} (c_{j_{1}}, \dots, c_{j_{r}}),

c_{i} = ϕ_{i} (c_{j_{1}}, \dots, c_{j_{r}}),

d({\mathcal{C}})\leq n-k-\Big{\lceil}\frac{k}{r}\Big{\rceil}+2.

d({\mathcal{C}})\leq n-k-\Big{\lceil}\frac{k}{r}\Big{\rceil}+2.

d\leq n-k+1-\Big{(}\Big{\lceil}\frac{k}{r}\Big{\rceil}-1\Big{)}(\rho-1).

d\leq n-k+1-\Big{(}\Big{\lceil}\frac{k}{r}\Big{\rceil}-1\Big{)}(\rho-1).

k \leq 1 \leq s \leq n / (r + 1) min {sr + lo g_{q} M_{q} (n - s (r + 1), d)},

k \leq 1 \leq s \leq n / (r + 1) min {sr + lo g_{q} M_{q} (n - s (r + 1), d)},

{\color[rgb]{0,0,0}\mu=\mu(n,d,r,\rho):=\left\lceil\frac{n-(d-1)}{{N}}\right\rceil+1},

{\color[rgb]{0,0,0}\mu=\mu(n,d,r,\rho):=\left\lceil\frac{n-(d-1)}{{N}}\right\rceil+1},

k \leq μ lo g_{q} B (N, ρ) .

k \leq μ lo g_{q} B (N, ρ) .

X_{i + 1} = R_{i + 1} \ \cup_{l = 0}^{i} X_{l} .

X_{i + 1} = R_{i + 1} \ \cup_{l = 0}^{i} X_{l} .

∣ X_{[m - 1]} ∣ < n - (d - 1),

∣ X_{[m - 1]} ∣ < n - (d - 1),

∣ X_{[m]} ∣ \leq ∣ X_{[m - 1]} ∣ + N \leq μ N .

∣ X_{[m]} ∣ \leq ∣ X_{[m - 1]} ∣ + N \leq μ N .

∣ C_{i} ∣ \leq j = 1 \prod i B (∣ X_{j} ∣, ρ) for all i = 1, ..., m .

∣ C_{i} ∣ \leq j = 1 \prod i B (∣ X_{j} ∣, ρ) for all i = 1, ..., m .

∣ C ∣ = ∣ C_{m} ∣ \leq j = 1 \prod m B (∣ X_{j} ∣, ρ) .

∣ C ∣ = ∣ C_{m} ∣ \leq j = 1 \prod m B (∣ X_{j} ∣, ρ) .

B (∣ X_{i} ∣, ρ) B (∣ X_{j} ∣, ρ)

B (∣ X_{i} ∣, ρ) B (∣ X_{j} ∣, ρ)

j = 1 \prod m B (∣ X_{j} ∣, ρ) \leq B (N, ρ)^{μ} .

j = 1 \prod m B (∣ X_{j} ∣, ρ) \leq B (N, ρ)^{μ} .

f (n)

f (n)

= (1 + Q) f (n - 1) - (e n - 1) Q^{e + 1} .

f (n_{1})

f (n_{1})

=

\displaystyle-f(n_{1}-1)\Big{[}(1+{Q})f(n_{2})-\binom{n_{2}}{e}{Q}^{e+1}\Big{]}

=

=

f (n_{1} - 1) f (n_{2} + 1) \leq f (n_{1}) f (n_{2}),

f (n_{1} - 1) f (n_{2} + 1) \leq f (n_{1}) f (n_{2}),

\frac{q ^{n_{1}}}{f ( n _{1} )} \cdot \frac{q ^{n_{2}}}{f ( n _{2} )} \leq \frac{q ^{n_{1} - 1}}{f ( n _{1} - 1 )} \cdot \frac{q ^{n_{2} + 1}}{f ( n _{2} + 1 )} .

k\leq{\mu}\Big{(}{N}-\log_{q}\Big{(}\sum_{e=0}^{\lfloor\frac{\rho-1}{2}\rfloor}\binom{{N}}{e}{\color[rgb]{0,0,0}(q-1)^{e}}\Big{)}\Big{)}.

k\leq{\mu}\Big{(}{N}-\log_{q}\Big{(}\sum_{e=0}^{\lfloor\frac{\rho-1}{2}\rfloor}\binom{{N}}{e}{\color[rgb]{0,0,0}(q-1)^{e}}\Big{)}\Big{)}.

k \leq μ lo g_{q} \frac{ρ}{ρ - \frac{q - 1}{q} N} .

k \leq μ lo g_{q} \frac{ρ}{ρ - \frac{q - 1}{q} N} .

k \leq μ r .

k \leq μ r .

R_{q} (r, ρ, δ) = n \to \infty lim sup \frac{1}{n} lo g_{q} M_{q} (n, r, ρ, δ n),

R_{q} (r, ρ, δ) = n \to \infty lim sup \frac{1}{n} lo g_{q} M_{q} (n, r, ρ, δ n),

R_{q}(r,\rho,\delta)\geq\frac{r}{{N}}-\min_{0<s\leq 1}\Big{\{}\frac{\log_{q}b_{\rho}(s)}{{N}}-\delta\log_{q}s\Big{\}}

R_{q}(r,\rho,\delta)\geq\frac{r}{{N}}-\min_{0<s\leq 1}\Big{\{}\frac{\log_{q}b_{\rho}(s)}{{N}}-\delta\log_{q}s\Big{\}}

b_{ρ} (s) =

b_{ρ} (s) =

\times j = 0 \sum w - ρ (j w - 1) (- q)^{- j} .

A_{i}

A_{i}

E_{j}

R_{\underline{i}} = {((y_{1}, \dots, y_{s}),

R_{\underline{i}} = {((y_{1}, \dots, y_{s}),

(y_{j}, y_{j}^{'}) \in R_{i_{j}}, j = 1, \dots, s} .

\displaystyle|{\mathcal{C}}|\leq\max\Big{\{}

\displaystyle|{\mathcal{C}}|\leq\max\Big{\{}

\displaystyle a_{i}=0\text{ for }1\leq i\leq d-1,a_{i}\geq 0\text{ for }d\leq i\leq n\Big{\}}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Abhishek Agarwal1 1College of Information and Computer Sciences, University of Massachusetts–Amherst, Amherst, MA 01003. Emails: {abhiag,arya}@cs.umass.edu. Research supported by NSF grants CCF 1642658, CCF 1318093 and CCF 1618512.

Alexander Barg2 2Dept. of ECE and ISR, University of Maryland, College Park, MD 20742 and IITP, Russian Academy of Sciences, 127051 Moscow, Russia Email: [email protected]. Research supported in part by NSF grants CCF 1422955 and CCF 1618603.

Sihuang Hu3 3 Lehrstuhl D für Mathematik, RWTH Aachen, Germany. Email: [email protected]. This work was done while this author was a postdoc at Department of Electrical Engineering - Systems, Tel Aviv University, Israel. Research supported by ERC grant no. 639573, ISF grant no. 1367/14, and the Alexander von Humboldt Foundation.

Arya Mazumdar1

Itzhak Tamo4 4 Department of Electrical Engineering - Systems, Tel Aviv University, Israel. Email: [email protected]. Research supported by ISF grant no. 1030/15 and NSF-BSF grant no. 2015814.

Abstract

Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds.

I Introduction

††The authors’ names appear in alphabetical order.††This paper was presented in part at 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, July 2016, and at 54th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, September 2016.

We consider codes over a finite alphabet that have the usual property of error correction and the additional property of being able to recover one or more erased symbols of the codeword by accessing only a small number of other symbols. Codes of this kind are said to be locally recoverable (LRC), and they have applications in large-scale distributed storage systems. LRC codes were first defined in [1] and were studied in a number of subsequent papers in recent years.

A $q$ -ary code ${\mathcal{C}}$ of length $n,$ cardinality $M,$ and distance $d$ is a set of $M$ vectors over an alphabet $Q,|Q|=q$ with minimum pairwise Hamming distance $d$ . The quantity $k=\log_{q}M$ is called the dimension of ${\mathcal{C}}.$ If $Q$ is a finite field and ${\mathcal{C}}$ is a linear subspace of $Q^{n}$ then $k$ is the dimension of ${\mathcal{C}}$ as a vector space. Below, $[n]\equiv\{1,\dots,n\}$ , and for any $x\in Q^{n}$ , $x_{i}$ is the projection of $x$ in the $i$ th coordinate. By extension, for any $I\subseteq[n]$ , $x_{I}$ is the projection of $x$ onto the coordinates of $I$ .

Definition 1

A code ${\mathcal{C}}\subset Q^{n}$ is locally recoverable with locality $r$ if every coordinate $i\in\{1,2,\dots,n\}$ is contained in a subset ${\mathcal{R}}_{i}\subseteq[n]$ of size $r+1$ such that for every codeword $c\in{\mathcal{C}}$ there is a function $\phi_{i}:Q^{r}\to Q$ with the property that

[TABLE]

where $j_{1}<j_{2}<\cdots<j_{r}$ are the elements of ${\mathcal{R}}_{i}\backslash\{i\}.$ We use the notation $(n,k,r)$ to refer to a code of length $n$ , dimension $k$ and locality $r.$

The definition of LRC codes was extended in several different ways. The following generalization is important for our purposes.

Definition 2

A code ${\mathcal{C}}\subset Q^{n}$ of cardinality $q^{k}$ is said to have the $(\rho,r)$ locality property (to be an $(n,k,r,\rho)$ LRC code) where $\rho\geq 2$ , if each coordinate $i\in[n]$ is contained in a subset ${\mathcal{R}}_{i}\subset[n]$ of size at most $r+\rho-1$ such that the restriction ${\mathcal{C}}_{{\mathcal{R}}_{i}}$ of the code ${\mathcal{C}}$ to the coordinates in ${\mathcal{R}}_{i}$ forms a code of distance at least $\rho$ . Notice that the values of any $\rho-1$ coordinates of ${\mathcal{R}}_{i}$ are determined by the values of the remaining $|{\mathcal{R}}_{i}|-(\rho-1)\leq r$ coordinates, thus enabling local recovery.

This definition was first proposed in [2, 3] with the less demanding restriction of protecting only the information symbols of the codeword (see also [4] for a related but different notion). In the above definition we consider all-symbol locality, without differentiating between the information and parity symbols. The set ${\mathcal{R}}_{i}$ is called the repair group, and the set ${\mathcal{R}}_{i}\backslash\{i\}$ is called the recovery set for the coordinate $i$ .

Other extensions of the concept of LRC codes include codes with multiple disjoint repair groups for every coordinate, also called codes with the availability property [5], codes with sequential repair of several erasures [6], codes with cooperative repair [7], local repair on graphs [8], as well as other variations.

Problems of constructing LRC codes and bounding their parameters have been the subject of a considerable number of publications. Constructions of LRC codes obtained by combining some known code families without the locality property were suggested in [9, 10, 11]. A family of codes extending the construction of Reed-Solomon codes to codes with locality was proposed in [12] and further generalized to codes on algebraic curves in [13]. We refer to [14] for a survey of some aspects of the algebraic theory of LRC codes.

Research on bounds for LRC codes was initiated in [1] which showed that the distance $d({\mathcal{C}})$ of an $(n,k,r)$ LRC code ${\mathcal{C}}$ is bounded as follows:

[TABLE]

In [15] this bound was extended to the case of arbitrary $\rho\geq 2$ . Namely, the distance of an $(n,k,r,\rho)$ LRC code satisfies the inequality

[TABLE]

Bounds for codes with availability were established in [5, 16, 17].

Note that the bounds (2), (3) do not depend on the size of the code alphabet $q$ . A bound that accounts for the value of $q$ was derived in [18]. It has the following form: For any $q$ -ary LRC code with the parameters $(n,k,r)$ and distance $d,$

[TABLE]

where $M_{q}(n,d)$ is the maximum cardinality of a $q$ -ary length $n$ code with distance $d$ . This bound can be used to derive asymptotic upper bounds on the rate of LRC codes with a given value of the distance (more on this below). Asymptotic lower bounds (achievability results) on the rate of LRC codes, namely Gilbert-Varshamov (GV) type asymptotic bounds, were also derived independently in [17, 18]; in particular the former work derives a bound for the case of availability $2$ as well.

In this paper we focus on combinatorial upper bounds on the parameters of LRC codes, tightening prior results, and emphasizing the dependence between the parameters and the size of the code alphabet $q$ . We explore several general approaches to the derivation of the upper bounds, including recursive bounds, the linear programming approach, and the approach relying on the coset leader graph of the code.

Linear programming (LP) is a powerful technique that accounts for some of the best known upper bounds on the size of codes with a given distance. It was pioneered in [19] and used in [20] to derive the best currently known asymptotic upper bound on error correcting codes. These results rely on the approach to codes via association schemes and their eigenvalues, combined with some analytic techniques. Incorporating the locality constraints into the LP problem in a way that yields closed-form bounds is a nontrivial problem. We suggest a way to address it under the additional assumption that ${\mathcal{R}}_{i}\cap{\mathcal{R}}_{j}=\emptyset,i\neq j,$ i.e., that different repair groups are disjoint, and the set of coordinates $[n]$ is a disjoint union of the repair groups. With this assumption, an association scheme that fits the locality constraints forms a Delsarte extension of the usual Hamming scheme. Relying on this observation, we derive an LP bound on $(n,k,\rho,r)$ LRC codes in a polynomial form and construct a polynomial that gives rise to a Singleton-like bound on such codes. We also compute numerical examples for $\rho=2,$ which corresponds to the original definition of LRC codes, and show that the LP bound is sometimes better than the only other known alphabet-dependent bound (4).

We note that LP bounds on linear LRC codes were earlier studied in [21] which considered a standard LP problem [19] with the additional constraint that every coordinate is contained in a codeword of the dual code of weight $\leq r+1.$ At the same time [21] gave no closed-form solutions of the LP problem or any numerical examples. LP bounds for LRC codes with multiple repair groups were considered in [22] and LP bounds for cyclic LRC codes were considered in [23].

Finally, we study asymptotic upper bounds on linear LRC codes that satisfy Definition 1. The starting point of our study is an observation that a linear LRC code necessarily contains several low-weight parity checks. Another class of codes that has the same property is low-density parity check (LDPC) codes. A recent work [24] derived new improved asymptotic bounds on the rate of LDPC codes by analyzing the coset graph of the code. While LDPC codes by definition contain only low-weight parity checks, LRC codes combine such checks with a large number of unrestricted parity check equations. Nevertheless, it is possible to combine the approach of [24] with the recursive bound (4) to obtain an asymptotic bound on linear LRC codes that is better than the asymptotic bound obtained from (4). An even better bound can be obtained for linear LRC codes with disjoint repair groups.

The paper is organized as follows. In Section II we derive a general upper bound on the size of LRC codes that reduces the problem to bounds on codes with a given distance but without locality constraints. This result is conceptually similar to the bound (4) (from [18]) but relies on a different kind of recursion, and this reduction enables us to use known bounds on codes to derive new results for LRC codes. In conjunction with the asymptotic Gilbert-Varshamov (GV) bound on $(n,k,r,\rho)$ LRC codes of [13], this yield an exact value of the asymptotic code rate when $d/n\to 0.$ This result, proved earlier for $\rho=2$ in [18, 17], is extended here to any $\rho\geq 2.$ In Section III, we derive Delsarte’s linear programming (LP) bounds for LRC codes with disjoint repair groups. The results include a Singleton bound for LRC codes. For the special case of usual LRC codes (Definition 1) with disjoint repair groups, our results improve the shortening bound of (4). In Section IV, we consider linear LRC codes, and by using a theory of coset-leader graphs combined with the approach of [18], are able to provide better asymptotic bounds on the rate-relative distance trade-off of LRC codes. The bound becomes stronger if we consider disjoint repair groups.

This paper is a result of merging and developing the papers by S. Hu, I. Tamo, and A. Barg [25] and by A. Agarwal and A. Mazumdar [26], both devoted to the problem of deriving alphabet-dependent bounds on LRC codes.

II New Bounds on LRC Codes

In the next theorem we introduce a method of using upper bounds on codes with a given distance (without the locality property) to derive upper bounds on LRC codes. Let $B(l,\rho)$ be an upper bound on the cardinality of a code of length $l$ and distance $\rho$ , which is a log-convex function111A positive function $f(j)$ of the integer argument is called log-convex if $f(j_{1})f(j_{2})\leq f(j_{1}-1)f(j_{2}+1)$ for any $j_{1}\leq j_{2}$ in the support of $f$ . of $l$ and such that $B(0,\rho)=1.$

Theorem 1

Let ${\mathcal{C}}$ be an $(n,k,r,\rho)$ $q$ -ary LRC code with distance $d$ , and let

[TABLE]

where $N=r+\rho-1$ . Then, for any $\rho\geq 2$ , we have

[TABLE]

Proof:

We begin with constructing a sequence of nonempty disjoint subsets $X_{i}\subset[n]$ whose union is of size at least $n-(d-1).$ Starting with $X_{0}=\emptyset,$ assume that the sets $X_{0},...,X_{i},i\geq 0$ are already constructed. If $|\cup_{l=0}^{i}X_{l}|\geq n-(d-1),$ terminate the procedure. Otherwise let $j$ be an arbitrary element in $[n]\backslash\cup_{l=0}^{i}X_{l}$ (w.l.o.g. we can assume that $j=i+1$ ) and define

[TABLE]

Suppose that this procedure terminates after $m$ steps, and let $X_{1},...,X_{m}$ be the sequence of subsets constructed above. For $i=1,...,m$ let $X_{[i]}:=\cup_{l=1}^{i}X_{l}$ and denote by ${\mathcal{C}}_{i}$ the restriction of ${\mathcal{C}}$ to the coordinates in $X_{[i]}$ . Note that by the construction, we have

[TABLE]

and

[TABLE]

Let us prove by induction on $i$ that

[TABLE]

For $i=1$ , $X_{1}={\mathcal{R}}_{1}$ and by definition, the code ${\mathcal{C}}_{1}={\mathcal{C}}_{{\mathcal{R}}_{1}}$ has distance at least $\rho.$ Therefore, $|{\mathcal{C}}_{1}|\leq B(|X_{1}|,\rho)$ . Now assume that (8) holds for ${\mathcal{C}}_{i-1}.$ Let $c$ be an arbitrary codeword of ${\mathcal{C}}_{i-1}$ , and let $S(c)$ be the set of codewords in ${\mathcal{C}}_{i}$ whose restriction to $X_{[i-1]}$ equals $c$ . These codewords can be different only in the coordinates in $X_{i}\ (\subseteq{\mathcal{R}}_{i})$ , and therefore the restriction of $S(c)$ to the coordinates in $X_{i}$ forms a code of distance at least $\rho.$ This implies that $|S(c)|\leq B(|X_{i}|,\rho)$ , and so $|{\mathcal{C}}_{i}|\leq|{\mathcal{C}}_{i-1}|B(|X_{i}|,\rho).$ This completes the induction step.

Since the code ${\mathcal{C}}$ has distance $d$ and $|X_{[m]}|\geq n-(d-1)$ , it follows that

[TABLE]

Suppose that $i,j$ are such that $1\leq|X_{i}|\leq|X_{j}|\leq{N},$ then using log-convexity, we obtain

[TABLE]

This step can be repeated $\min(|X_{i}|,{N}-|X_{j}|)$ times till either the larger subset is of the maximum possible size ${N}$ or the smaller one becomes empty (in which case we put $B(0,\rho)=1$ ). Use this argument in (9) and successively reduce the number of factors on the right-hand side as many times as possible. On account of (7) we will obtain at most $\mu$ factors, and in each of them the size of the coordinate subset $X_{\{\cdot\}}$ will be ${N}$ or less. We conclude that

[TABLE]

Now (6) follows by combining (9) and (10) and taking logarithms on both sides of the resulting equation. ∎

Theorem 1 provides a general upper bound on the size of LRC codes. Explicit results are obtained once we substitute a log-convex upper bound $B(\cdot)$ . Fortunately, many known bounds on codes are in fact log-convex. For instance, let us prove that this is the case for the Hamming (sphere-packing) bound.

Lemma 2

The function $B_{H}(n,e)={q^{n}}/{(\sum_{i=0}^{e}\binom{n}{i}(q-1)^{i})}$ is log-convex in $n$ .

Proof:

Let ${Q}=q-1$ and let $f(n,e)=\sum_{i=0}^{e}{n\choose i}{Q}^{i}$ . In all the expressions below $e$ does not change, so to simplify the notation we write $f(n)$ instead of $f(n,e)$ . For any $n\geq 1$ we have

[TABLE]

We find

[TABLE]

It is straightforward to check that with $n_{1}\leq n_{2}$ each term inside the brackets on the last line is nonnegative, and so

[TABLE]

∎

Similar (but simpler) checks can be performed to verify the log-convexity of the Plotkin and Singleton bounds, and we obtain the following corollary.

Corollary 3

Let ${\mathcal{C}}$ be an $(n,k,r,\rho)$ $q$ -ary LRC code with distance $d$ , and let $\mu$ be defined in (5). The following bounds hold true:

Locality-dependent Hamming bound:

[TABLE] 2. 2.

Locality-dependent Plotkin bound: Let $\rho>\textstyle{\frac{q-1}{q}}{N},$ then

[TABLE] 3. 3.

Locality-dependent Singleton bound:

[TABLE]

The bound (13) is slightly weaker than the Singleton-type bounds in (2) and (3).

Remark 1

Not all bounds on codes are log-convex in the code length. For example, let $M_{2}(l,\rho)$ be the maximum size of a binary code of length $l$ and distance $\rho.$ We have $M_{2}(7,4)=8,M_{2}(8,4)=16,M_{2}(9,4)=20,$ and so $M_{2}(8,4)^{2}>M_{2}(7,4)M_{2}(9,4),$ violating the log-convexity condition (which stipulates that the geometric average be greater than the “middle value”).

Let

[TABLE]

where ${M_{q}(n,r,\rho,\delta n)}$ is the maximum cardinality of the $(n,k,r,\rho)$ LRC code with minimum distance $\delta n$ . We finish this section by showing that the bound (13) can be combined with a known result to derive an exact value of $R_{q}$ for $\delta=0.$ The following lower asymptotic bound for LRC codes was obtained in [13].

Theorem 4 ([13])

Assume that there exists a $q$ -ary MDS code of length ${N}=\rho+r-1$ and distance $\rho.$ Then the following Gilbert-Varshamov type bound holds true:

[TABLE]

where

[TABLE]

This result implies the following corollary, which for $q=2$ was already established in [17].

Corollary 5

Assume that there exists a $q$ -ary MDS code of length ${N}$ and distance $\rho,$ then $R_{q}(r,\rho,0)=\frac{r}{{N}}.$

Proof:

The bound (13) implies the estimate $R_{q}(r,\rho,0)\leq\frac{r}{{N}}$ while (14) gives the opposite inequality. ∎

III Algebraic Combinatorics of LRC Codes and LP Bounds

Delsarte’s linear programming bound is a powerful method of estimating the size of optimal codes in various metric spaces that satisfy a set of general assumptions [19]. In this section we develop an adaptation of the approach in [19] to $(n,k,r,\rho)$ LRC codes.

III-A Association Schemes and their Powers

III-A1 Metric Association Schemes

We begin with a brief reminder about metric association schemes [19, 27]. Let $X$ be a finite metric space with distance function $d$ , and let $\mathbf{R}=\{R_{0},R_{1},\ldots,R_{n}\}$ be a partition of $X\times X$ such that $R_{i}:=\{(x,y)\in X^{2}\mid d(x,y)=i\}$ for all $i$ . The pair ${\mathcal{A}}=(X,\mathbf{R})$ is called an association scheme if the intersection volume of two balls in $X$ depends only on the distance between their centers and the radii of the balls. For each $i$ denote by $A_{i}$ the $|X|\times|X|$ adjacency matrix of $R_{i}$ , where $(A_{i})_{x,y}=1$ if $(x,y)\in R_{i}$ and [math] otherwise. The matrices $A_{0},A_{1},\ldots,A_{n}$ span a complex semisimple algebra of dimension $n+1$ , called the Bose-Mesner algebra of the scheme. Since each $A_{i}$ is symmetric, this algebra is commutative. It affords a dual basis of minimal idempotents $E_{0},E_{1},\ldots,E_{n}.$ We can represent the matrix $A_{i}$ as a linear combination of the idempotents. The coefficients of this expansion form the first eigenmatrix of the scheme ${\mathcal{A}}$ , denoted by $P$ . A similar transition can be performed in the other direction, and the corresponding coefficients form the second eigenmatrix of ${\mathcal{A}},$ denoted by $Q$ . Namely, we have

[TABLE]

III-A2 Products of Association Schemes

Let ${\mathcal{A}}=(X,\mathbf{R})$ be a metric association scheme with eigenmatrices $P$ and $Q$ , and let $Y:=X^{s}$ be a Cartesian power of $X$ . We can define a product association scheme ${\mathcal{A}}^{\otimes s}=(X,\mathbf{R})^{\otimes s}$ by introducing the relations $R_{{{\underline{i}}}},{{\underline{i}}}=(i_{1},\dots,i_{s})$ on $Y\times Y$ in the following obvious way [19, p.17]:

[TABLE]

The adjacency matrices of ${\mathcal{A}}^{\otimes s}$ are formed of the Kronecker products $\otimes_{i=1}^{s}A_{i}(=A^{\otimes s}),$ where $A_{i}(=A)$ is an adjacency matrix of the $i$ th copy of $X$ in the product. It is not hard to check that the first (second) eigenmatrix of the scheme ${\mathcal{A}}^{\otimes s}$ equals the $s$ th Kronecker power of $P$ (resp., of $Q$ ).

III-A3 The Linear Programming Bound

Let $(X,{\mathcal{A}})$ be an association scheme with $n$ classes and let ${\mathcal{C}}$ be a code (any subset ${\mathcal{C}}\subset X$ ). The distance distribution of ${\mathcal{C}}$ is given by ${\mathbf{a}}=(a_{0},a_{1},\dots,a_{n}),$ where $a_{i}=|({\mathcal{C}}\times{\mathcal{C}})\cap R_{i}|/|{\mathcal{C}}|$ is the average number of codewords at distance $i$ from a given codeword of ${\mathcal{C}}$ . Clearly, $a_{0}=1$ and $\sum a_{i}=|{\mathcal{C}}|$ . The vector ${\mathbf{a}}Q,$ called the MacWilliams transform of the distance distribution of ${\mathcal{C}}$ , satisfies the Delsarte inequalities $({\mathbf{a}}Q)_{i}\geq 0,i=1,\dots,n$ . This gives rise to Delsarte’s linear programming bound on codes: let ${\mathcal{C}}\subset X$ be a code with distance $d$ , then

[TABLE]

(e.g., [19, Ch.2,3], [28, Ch.17]). This bound also applies to product schemes. Indeed, let ${\mathcal{A}}^{\otimes s}=(X,{\mathcal{A}})^{\otimes s}$ and let ${\mathcal{C}}\subset X^{s}$ be a code. Let ${\mathbf{a}}=(a_{\underline{i}}),$ where ${{\underline{i}}}=(i_{1},\dots,i_{s})$ and $i_{j}=0,\dots,n$ for all $j,$ be the distance distribution of ${\mathcal{C}}$ . Similarly, $a_{\underline{0}}=1$ and $\sum a_{{\underline{i}}}=|{\mathcal{C}}|$ . Suppose that $a_{{\underline{i}}}=0$ if ${\underline{i}}\not\in\{\underline{0}\}\cup T$ where $T$ is some subset of $\{0,\dots,n\}^{s}\backslash\{\underline{0}\}$ . Then we have

[TABLE]

where $Q$ is the second eigenmatrix of ${\mathcal{A}}^{\otimes s}.$ More details about product schemes are given in [19, Sec. 2.5] as well as in more recent works [29, 30].

III-A4 The Hamming Scheme

The following classic example will be useful below in the context of LRC codes. Let $F$ be a set of cardinality $q\ (q\geq 2)$ and let $X=F^{n}$ be a Cartesian power of $F$ . We specialize the definition of the metric scheme by assuming that $d$ is a Hamming metric on $X$ . Namely, let $R_{i}:=\{(x,y)\in X^{2}\mid d(x,y)=i\}$ where $d$ is the Hamming distance. We obtain a symmetric association scheme with $n$ classes, denoted by $H(n,q)$ . The eigenvalues of $H(n,q)$ are given by $Q_{ij}=K^{(n)}_{j}(i)$ [19], where

[TABLE]

is the Krawtchouk polynomial. Also we have $P=Q$ .

The Hamming scheme $H(n,q)$ also carries the structure of a product scheme for $n\geq 2$ . Consider the Hamming scheme $H(m+n,q)$ as being obtained from the product of $H(m,q)$ and $H(n,q)$ by merging all relations $R_{i^{\prime},i^{\prime\prime}}$ with $i^{\prime}+i^{\prime\prime}=i$ into one relation $R_{i}$ . We have $A_{i}=\sum_{i^{\prime}+i^{\prime\prime}=i}A_{i^{\prime}i^{\prime\prime}},E_{i}=\sum_{i^{\prime}+i^{\prime\prime}=i}E_{i^{\prime}i^{\prime\prime}}$ and $Q_{ij}=\sum_{j^{\prime}+j^{\prime\prime}=j}Q_{i^{\prime}j^{\prime}}Q_{i^{\prime\prime}j^{\prime\prime}}$ for any pair $(i^{\prime},i^{\prime\prime})$ with $i^{\prime}+i^{\prime\prime}=i.$ Also we can view all three association schemes involved as merged versions of powers of $H(1,q)$ .

Clearly, $H(n,q)=(H(1,q))^{\otimes n},$ and similarly $H(st,q)=(H(t,q))^{\otimes s}$ for any $s\geq 2.$ We conclude that the eigenvalues of the scheme $H(st,q)$ have the form

[TABLE]

where the multi-indices ${\underline{i}},{\underline{j}}$ are the indices of the relations of the scheme. As is the case with the original Hamming scheme, the obtained scheme $H(st,q)$ is also self-dual. It is this setting that we apply to the analysis of LRC codes in the next section.

III-B The Linear Programming Bound for LRC Codes with Disjoint Repair Groups

III-B1 General Bound

We begin with stating a general LP bound for codes with locality. Let ${\mathcal{C}}$ be an $(n,k,r,\rho)$ -LRC code with minimum distance $d$ . Suppose that $n=s{N}$ where $N=r+\rho-1$ . For $0\leq t\leq s-1$ , define the interval

[TABLE]

so the coordinate set is a disjoint union of these intervals:

[TABLE]

For $I\subset[n]$ denote by ${\mathcal{C}}|_{I}$ the projection of ${\mathcal{C}}$ on the coordinates in $I$ . Throughout this section we will assume that the code ${\mathcal{C}}$ has the property that

[TABLE]

In accordance with (16), define the following polynomials of $s$ discrete variables ${\mathbf{x}}=(x_{1},\ldots,x_{s})$

[TABLE]

where ${\underline{j}}=(j_{1},\ldots,j_{s}).$ The polynomials $K^{({N})}_{{\underline{j}}}$ are orthogonal on the set $\{0,\dots,N\}^{s}$ :

[TABLE]

where

[TABLE]

and $\delta_{{\underline{j}},{\underline{j}}^{\prime}}$ is the Kronecker delta function.

Let ${\mathbf{a}}=(a_{{\underline{i}}},{\underline{i}}\in\{0,\dots,N\}^{s})$ be the distance distribution of ${\mathcal{C}}$ , where each ${{\underline{i}}}=({i_{1}},{i_{2}},\dots,{i_{s}})$ is an $s$ -tuple. Here $a_{\underline{i}}$ is the number of pairs of codewords $c=(c_{1},c_{2},\ldots,c_{s}),c^{\prime}=(c^{\prime}_{1},c^{\prime}_{2},\ldots,c^{\prime}_{s})\in{\mathcal{C}}$ such that the Hamming distance $d(c_{j},c^{\prime}_{j})={i_{j}},j=1,\dots,s,$ normalized by the cardinality of the code $q^{k}.$ Note that the codewords $c_{j},c^{\prime}_{j}$ are contained in the code ${\mathcal{C}}|_{{\mathcal{R}}_{j}}.$ By definition, we have $a_{{\underline{0}}}=1$ , and $a_{{\underline{i}}}=0$ if ${\underline{i}}\not\in\{\underline{0}\}\cup T$ where $T=\{{\underline{i}}=(i_{1},\ldots,i_{s})\mid i_{1}+\cdots+i_{s}\geq d,i_{j}\in\{0,\rho,\rho+1,\dots,{N}\}\text{ for all }j=1,\dots,s\}.$

Now it is direct to check that the general bound of (15) in our case takes the following form.

Theorem 6 (Primal LP bound)

Let ${\mathcal{C}}$ be a $q$ -ary $(n,k,r,\rho)$ LRC code with distance $d$ . Define

[TABLE]

Then the cardinality of ${\mathcal{C}}$ satisfies $|{\mathcal{C}}|\leq 1+\sum_{{\underline{i}}\in T}a_{{\underline{i}}},$ where the vector $(a_{{\underline{i}}},{\underline{i}}\in T)$ is a solution of the following LP problem

[TABLE]

The dual problem of the LP problem in Theorem 6 has the following form.

Theorem 7 (Dual LP bound)

Let ${\mathcal{C}}$ and $T$ be as defined in Theorem 6. The cardinality of ${\mathcal{C}}$ satisfies

[TABLE]

As in the classical case (cf. [19, p.53],[28]), instead of solving this LP problem, we construct feasible solutions which provide upper bounds for the minimum. We state the result in polynomial form, which is obvious from (18)-(20).

Corollary 8

Let ${\mathcal{C}}$ be a $q$ -ary $(n,k,r,\rho)$ LRC code with distance $d$ . Let $f({\mathbf{x}})=f(x_{1},\dots,x_{s})$ be a polynomial whose Krawtchouk expansion has the form

[TABLE]

where the coefficients $f_{\underline{j}}$ satisfy (i) $f_{\underline{j}}\geq 0$ for ${\underline{j}}\in\{0,\dots,N\}^{s}\backslash\{{\underline{0}}\}$ , and (ii) $f({\underline{i}})\leq 0$ for ${\underline{i}}\in T.$ Then $|{\mathcal{C}}|\leq f({{\underline{0}}}).$ **

III-B2 The Singleton Bound

The bounds in Corollary 3 can be proved using the polynomial approach of Corollary 8. To exemplify this claim, we give another proof of the Singleton bound. The original form of this bound in [2] is as follows:

[TABLE]

Recall that $n=sN=s(r+\rho-1)$ . Assume that $d=t{N}+\partial$ for some $\partial$ , $1\leq\partial\leq{N}$ . Relaxing (21) by omitting the ceiling function, we obtain

[TABLE]

Recall that in the classical case the Singleton bound is proved using the polynomial [19, p. 54], [28, p. 544]

[TABLE]

(the “annihilator” of the weight distribution). Following this approach, define the polynomial $f({\mathbf{x}})$ in the form

[TABLE]

We will prove that the polynomial $f$ is a feasible solution of the dual LP problem, i.e., that it satisfies the conditions in (19), (20). Consider the expansion

[TABLE]

On account of (23) we conclude that $f_{{\underline{j}}}\geq 0$ for all ${\underline{j}}$ , so (19) is indeed true.

To prove (20), we will show that $f({\underline{i}})\leq 0$ for ${\underline{i}}\in T$ . First suppose that $\rho\leq\partial$ . Choose any ${\underline{i}}=(i_{1},\dots,i_{s})\in T$ , then $i_{1}+\cdots+i_{s}\geq d$ . If $i_{s-t}+\cdots+i_{s}\geq d$ , then we must have $i_{s-t}\geq\partial$ , implying that $f({\underline{i}})=0$ . If $i_{s-t}+\cdots+i_{s}<d$ , then there must exist some nonzero $i_{l},1\leq l\leq s-t-1$ , which implies that $i_{l}\geq\rho$ and again $f({\underline{i}})=0.$ The case $\rho>\partial$ can be analyzed using similar arguments.

Therefore, by Theorem 7 we obtain the bound

[TABLE]

In other words,

[TABLE]

This estimate is an LP version of the Singleton bound, and it is slightly better than (22).

III-B3 Bounds for $(n,k,r,2)$ LRC Codes

It is interesting to apply the LP approach to bounds on LRC codes for $\rho=2$ , i.e., the case of single-symbol locality (Definition 1). The known bounds that apply in this case include the Singleton bound (13), which does not depend on $q$ , and a shortening bound of [18]. For the ease of reading we reproduce the bound (4): For any $(n,k,r,2)$ LRC code with distance $d$ , we have

[TABLE]

*where $M_{q}(n,d)$ is the maximum cardinality of a $q$ -ary code of length $n$ and distance $d.$ *

We computed this bound and the bound of Theorem 7, using $\rho=2$ in the definition of the index set $T$ . The results are summarized in Tables I–IV. Note that the corresponding length of the code is $n=s(r+\rho-1)=s(r+1)$ , and the entry of the table is the upper bound on the dimension $k$ . We perform the computations using the GAP package GUAVA and the package GLPK in the symbolic computations system SageMath [31]. Each result was verified using the package COIN-OR, also available in SageMath.

In all the above examples, the LP bound either matches the shortening bound or is tighter than it.

IV Asymptotic Bounds for Binary Linear LRC Codes

In this section we study asymptotic bounds for binary linear LRC codes that can locally correct one erasure. Throughout the section we assume that the code satisfies Definition 1. For $\delta\in[0,1/2],$ define the functions

[TABLE]

where $M(n,r,d)$ (respectively, $M^{(\text{\rm lin})}(n,r,d)$ ) is the maximum cardinality of a code (respectively, of a linear code) of length $n$ , distance $d$ and locality $r$ . Clearly, $R(r,\delta)\geq R^{(\text{\rm lin})}(r,\delta).$ The best currently known asymptotic bounds on binary LRC codes are described in the following theorem.

Theorem 9 ([18, 17])

We have

[TABLE]

where $R_{\text{\rm opt}}(\delta)$ is any asymptotic upper bound on the rate of codes with relative distance $\delta$ . These bounds imply that

[TABLE]

The lower bound (25) is of the Gilbert-Varshamov type and was derived in [18] and [17], while the bound (26) is obtained from (4) by passing to the limit of large block length $n$ (see [18]). To obtain the tightest possible bound in (26) we substitute the best known bound on $R_{\text{\rm opt}}(\delta)$ , i.e., the McEliece et al. bound [20]:

[TABLE]

where $g(x):=h(\frac{1}{2}-\frac{1}{2}\sqrt{1-x})$ and $h(x):=-x\log_{2}x-(1-x)\log_{2}(1-x)$ is the binary entropy function.

Remark 2

Even though in Sec. III-B3 we showed by example that the LP bound is better than the bound (4) for finite length, it is difficult to derive a closed-form asymptotic version of the LP bound. The problem occurs because to derive the asymptotic version of the LP bound it would be easier to have a small number of local codes whose distance $\rho$ grows in proportion to $n$ . In reality we have to deal with a growing number of local codes with distance $\rho=2$ .

In this part we prove a new bound on linear LRC codes which which improves upon the (linear case of the) bound (26). We begin with a remark that for linear LRC codes the recovery functions defined in (1) are also linear.

Lemma 10

Let ${\mathcal{C}}$ be a linear LRC code of length $n$ , dimension $k$ , and locality $r$ over a field ${\mathbb{F}}.$ Then for every coordinate $i\in[n]$ the recovery function $\phi_{i}$ defined in (1) is also linear.

Proof:

Choose a generator matrix $G$ of ${\mathcal{C}}$ and let $V\subset{\mathbb{F}}^{k}$ be the set of columns of $G$ . Given $x\in{\mathbb{F}}^{k}$ , the coordinates of the codeword $c=xG$ are equal to $(x,v)$ , where $v\in V$ and $(\cdot,\cdot)$ is the dot product. Since ${\mathcal{C}}$ has locality $r$ , the coordinate $(x,v_{i})$ must be a function of the coordinates in its recovery set ${\mathcal{R}}$ . Let $\{v_{j},j\in{\mathcal{R}}\}$ be the corresponding subset of columns of $G$ . Without loss of generality we may assume that the $\mathop{rk}(\{v_{j},j\in{\mathcal{R}}\})\leq k-1.$

If $v_{i}\in(\operatorname{span}\{v_{j},j\in{\mathcal{R}}\}),$ then there exists a linear recovery function for the $i$ th coordinate, so let us assume the opposite (thus, $v_{i}\neq 0$ ). Let $x\in{\mathbb{F}}^{k}$ be such that $(x,v_{i})\neq 0$ and $(x,v_{j})=0,j\in{\mathcal{R}}.$ Further, let $y\in{\mathbb{F}}^{k}$ be an arbitrary vector and consider the codewords $yG$ and $(x+y)G.$ Clearly, their entries in the coordinates of ${\mathcal{R}}$ are the same, so the LRC property implies that their $i$ th coordinates are equal as well. This however is clearly not the case, so we obtain a contradiction. ∎

The main result of this subsection is given in the following theorem.

Theorem 11 (Linear LRC codes)

The maximum rate of a linear LRC code satisfies the following inequality.

[TABLE]

where

[TABLE]

Bound (28) improves upon the bound in (26) for all values of relative distance, however the improvement is rather mild, and can barely be seen in a plot.

The proof of this theorem consists of two steps. First we observe that the approach of [24] applies to LRC codes, yielding a bound on their rate. In the second step we combine this approach with a recursive shortening bound approach of [18], obtaining Theorem 13 below.

To proceed with the first step, let us quote the main technical lemma of [24].

Lemma 12 ([24])

Consider a sequence of LDPC codes with increasing length $n$ and parities of weight at most $w$ . Suppose that the distance of codes converges to the value $\delta$ as $n\to\infty.$ Then the maximum achievable rate of the codes is bounded above by $R_{0}(w-1,\delta)$ given in (29).

Below in Sec. IV-A we provide some details of the argument of [24] because it is be needed for our second result in this section, namely a bound on linear LRC codes with disjoint repair groups. In particular, it will be clear that Lemma 12 also applies to linear LRC codes since the conditions required for it to hold are actually weaker than that the assumptions in the proof in [24]. The weaker set of assumptions used below is as follows:

[TABLE]

For linear LRC codes, these conditions are satisfied for $w=r+1$ . The statement in [24] in addition to (31) assumes that $\{v_{1},\dots,v_{m}\}$ form a basis for the code ${\mathcal{C}}^{\bot}$ . It is true that this condition holds for LDPC codes, but this is an artifact of the LDPC setting rather than an essential element of the proof that we give in Sec. IV-A. From our discussion below (see the remarks after Eq. (37)) it will become clear that conditions (31) suffice to complete the argument. As a consequence, we obtain the following bound on the rate of linear LRC codes:

[TABLE]

This bound does not improve on (26), but it is possible to establish a recursion that will lead to an improvement. Namely, we combine (32) with code shortening to obtain (28). The following statement is a minor modification of the result of [18].

Theorem 13

Let ${\mathcal{C}}$ be a binary LRC code of length $n$ , locality $r$ and distance $d$ , then the dimension of ${\mathcal{C}}$ , $\dim({\mathcal{C}})$ , satisfies,

[TABLE]

where $M(m,d,r)$ is the maximum cardinality of a linear LRC code222The statement in [18] does not include the LRC condition of the shortened code. of length $m$ , distance $d$ , and locality $r$ . Therefore,

[TABLE]

Proof:

Eq. (34) is an obvious consequence of (33), so let us prove (33). The proof in [18] relies on the fact that for any $s,1\leq s\leq k/r$ there exists a subset of coordinates $I\subset[n],|I|=s(r+1)$ such that $\log|C_{I}|\leq sr$ ([18, Lemma 1]). That such a subset exists can be shown relying on the locality property of the code ${\mathcal{C}}.$ Let $I^{c}\coloneqq[n]\setminus I$ . We shorten the code ${\mathcal{C}}$ to obtain a code ${\mathcal{C}}_{I^{c}}$ of length $n-s(r+1),$ dimension at least $k-sr,$ and distance $d$ ([18, Lemma 2]). This code is obtained by taking all the codewords that contain zeros in the coordinates in $I$ and discarding these coordinates.

The only added element in our claim is that the shortened code ${\mathcal{C}}_{I^{c}}$ itself is LRC. Indeed, let $i\in I^{c}.$ Referring to Def. 1, we need to prove that for any coordinate $i\in[n]\backslash I$ there exists a function $\phi_{i}$ that depends on at most $r$ other coordinates and computes the value of the $i$ th coordinate of the codeword $c\in{\mathcal{C}}_{I^{c}}.$ There are two cases:

(i) The repair group ${\mathcal{R}}_{i}$ of $i$ does not intersect the subset $I$ . In this case there is nothing to prove.

(ii) Some number of the coordinates of ${\mathcal{R}}_{i}$ are inside the subset $I$ . Let $J_{i}:={\mathcal{R}}_{i}\cap I.$ In this case the value of the discarded coordinates for every codeword of ${\mathcal{C}}_{I^{c}}$ is equal to [math]. Suppose that $\psi_{i}(\{c_{j},j\in{\mathcal{R}}_{i}\backslash i\})$ is the recovery function of the original code ${\mathcal{C}}.$ We claim that the recovery group of the coordinate $i\in I^{c}$ in the code ${\mathcal{C}}_{I^{c}}$ is the subset ${\mathcal{R}}_{i}\backslash J_{i}$ , and the recovery function is obtained from $\phi_{i}$ by substituting zeros for all the arguments in $J_{i}.$ Note that the function $\phi_{i}$ essentially depends on $|{\mathcal{R}}_{i}|-|J_{i}|-1\leq r$ coordinates of the codeword $c$ , conforming with the locality requirement. ∎

Remark 3

While we need this result only for linear codes, the claims of Theorem 13 are still valid if we omit the linearity assumption (with obvious modifications to the statement).

Now Theorem 11 follows immediately by using (32) in the estimate (34).

In the next subsection we give a sketch of the approach in [24] for LDPC codes. In Sec. IV-B, we improve on Theorem 11 for the case of disjoint repair groups.

IV-A The Approach of Iceland and Samorodnitsky [24] to Bounds on LDPC Codes

Coset graphs of linear codes have been often used as a tool to study combinatorial properties of codes and to obtain bounds on their parameters [32, 33, 34]. Given a linear code ${\mathcal{C}},$ define a graph $\Gamma(V,E)$ , where $V={{\mathbb{F}}}_{q}^{n}/{\mathcal{C}}$ , i.e., the vertices of $\Gamma$ correspond to the cosets of the code, and two cosets are connected by an edge if the Hamming distance between them is one.

Let ${\mathcal{C}}$ be a linear code. Throughout this section we use the coset graph of the dual code ${\mathcal{C}}^{\bot},$ so all the references to the coset graph below are with respect to ${\mathcal{C}}^{\bot}.$ The length of the shortest path between a pair of vertices equals the Hamming distance between the corresponding cosets. Given a vertex $v\in V,$ denote by ${\mathcal{B}}_{\Gamma}(v,t)$ the ball of radius $t$ around it in the graph. Since the graph is vertex-transitive, the volume of the ball does not depend on $v$ , and we will use the notation $B(t):=|{\mathcal{B}}_{\Gamma}(v,t)|,$ where $v$ is an arbitrary vertex. Clearly, $B(t)$ equals the number of cosets whose leaders are of weight at most $t.$

The starting point of the argument in [24] is the following result from [34].

Theorem 14 ([34])

Consider a linear codes ${\mathcal{C}}$ of length $n$ and let $B(t)$ be the number of cosets of weight at most $t$ in ${\mathcal{C}}^{\bot}.$ Then

[TABLE]

where $\tau$ is defined in (30) and $\delta$ denotes the relative distance of the code ${\mathcal{C}}$ .

Using the obvious estimate $B(t)\leq\sum_{i=0}^{t}\binom{n}{i},t=\tau n,$ one obtains a bound valid for any code ${\mathcal{C}}.$ The main idea in [24] is that it is possible to obtain a tighter estimate for $B(t)$ in the case when ${\mathcal{C}}$ is an LDPC code, leading to an improved bound on the rate of such codes compared to the universal bounds of [20].

Proposition 15 ([24])

Let $x\in\{0,1\}^{n}$ be a random vector with independent Bernoulli coordinates $x_{i}$ such that $P(x_{i}=1)=p,P(x_{i}=0)=1-p,p<1/2,$ and let $\pi_{p}$ be the probability that $x$ is a coset leader of ${\mathcal{C}}^{\bot}.$ Then

[TABLE]

Proof:

We include a very short proof. Limiting ourselves to the vectors $x$ with at most $pn$ ones, we have

[TABLE]

∎

The next step, which is the main technical ingredient of the result in [24], is to show that $\pi_{p}$ is an exponentially declining function of $n$ . Let us assume that the dual code ${\mathcal{C}}^{\bot}$ contains a set vectors $v_{1},\dots,v_{m}$ such that $\operatorname{wt}(v_{i})=w$ for all $i$ and that $\cup_{i}\operatorname{supp}(v_{i})=[n]$ .

Construct a partition ${\mathcal{I}}_{w}$ of the coordinate set $[n]$ into $w$ disjoint sets $I_{w},I_{w-1},\dots,I_{1}$ as follows. Suppose that $I_{w},I_{w-1},\dots,I_{k+1}$ are already defined. Set $I_{k}=\emptyset$ . For each of the vectors $v_{i},i=1,\dots,m$ consider the set of coordinates $L_{k,i}:=\operatorname{supp}(v_{i})\backslash\cup_{i=k}^{w}I_{i}$ , and if the size of $L_{k,i}$ is exactly $k,$ put $I_{k}\leftarrow I_{k}\cup L_{k,i}.$

It is easy to observe that each block $I_{i}$ in the partition ${\mathcal{I}}_{w}$ is a disjoint union of $t=|I_{i}|/i$ subsets $L_{i,j}$ such that $\lvert L_{i,j}\rvert=j$ for all $j,$ and each $L_{i,j}$ is contained in the support of a different vector $v_{i,j}\in\{v_{l}\}_{l\in[m]}$ Moreover, the set $\operatorname{supp}(v_{i,j})\setminus L_{i,j}$ is a subset of $\cup_{\ell=k+1}^{w}I_{\ell}.$ An important property of the partition ${\mathcal{I}}_{w}$ is as follows.

Lemma 16 ([24, Lemma 2])

Let $A=2/p^{w}.$ There exists an index $k\in\{1,2,\dots,w\}$ such that

[TABLE]

Now let $k$ be the index whose existence is guaranteed by this lemma. We have $|I_{k}|\geq\max\big{\{}A\sum_{j>k}|I_{j}|,\frac{n}{2wA^{w}}\big{\}}$ and the set $I_{k}$ is a disjoint union of $t=|I_{k}|/k$ sets $L_{k,j}.$ Now consider a coset leader $x$ . An easy argument shows that $\operatorname{supp}(x)$ contains no more than $tp^{k}/2$ subsets $L_{k,j}.$ Indeed, let $S:=\{j|L_{k,j}\subset\operatorname{supp}(x)\}$ and consider a vector $y$ from the same coset given by

[TABLE]

If $|S|\geq tp^{k}/2$ then we would obtain $\operatorname{wt}(y)<\operatorname{wt}(x),$ a contradiction.

The final step is to take a random vector $y$ as in Proposition 36 and to estimate the probability that it is a coset leader of the code ${\mathcal{C}}^{\bot}.$ First observe that for $k$ given by Lemma 16 we have

[TABLE]

Now let $Y\sim\text{Binom}(t,p^{k}),$ then the Chernoff bound gives

[TABLE]

where $c(\cdot,\cdot)$ is defined in Theorem 11.

Before finishing the proof of Theorem 11, observe that the requirement for $\{v_{i}\}_{i\in[m]}$ to form a basis of ${\mathcal{C}}^{\bot}$ was not used in the above argument, including the omitted proof of Lemma 16 (this proof, due to [24], applies here verbatim). The only assumptions used are those listed in (31)(a)-(b), namely that there exists a set of low-weight dual codewords whose supports jointly cover all the coordinates. The last assumption is used in the construction of the partition ${\mathcal{I}}_{w}$ in Lemma 16 and in (37).

This enables us to use inequalities (36), and (38) in (35) (taking $p=\tau$ and noting that $\tau<1/2$ for all $\delta\in(0,1/2)$ ). This substitution proves Lemma 12, and the estimate (28) in Theorem 11 follows immediately upon taking $w=r+1.$

IV-B Upper Bound on LRC Codes with Disjoint Repair Groups

Upper bounds for LRC codes with disjoint repair groups were already considered in Sec. III. Here we consider the asymptotic version of this problem, noting that the general result of Theorem 11 in this case admits a significant improvement.

Assume that $n=(r+1)m$ and consider a binary linear LRC code ${\mathcal{C}}$ of length $n$ , minimum distance $\delta n$ , and locality $r$ with disjoint repair groups ${\mathcal{R}}_{j},j=1,\dots,m.$ For every $j$ the repair group ${\mathcal{R}}_{j}$ corresponds to a vector $v_{j}$ of weight $r+1$ in ${\mathcal{C}}^{\bot}$ , and these vectors have pairwise disjoint supports and trivially satisfy the conditions in (31) (even if some repair groups are of smaller size, we can add redundant coordinates to them to bring their size to $r+1$ ).

Recall that ${\mathcal{B}}(\tau n)={\mathcal{B}}_{\Gamma}(0,\tau n)$ is the set of coset leaders of ${\mathcal{C}}^{\bot}$ of weight at most $\tau n$ and let $x\in{\mathcal{B}}(\tau n).$ The vector $x$ satisfies the following constraints:

[TABLE]

Eqns. (39a)-(39c) can be used to derive an upper bound on $R({\mathcal{C}})$ which is given in the following theorem.

Theorem 17 (Linear LRC codes with disjoint repair groups)

Let $R^{\text{\rm(lin,dis)}}(r,\delta)$ be the largest possible asymptotic rate of linear LRC codes with disjoint repair groups. Let $t\coloneqq\left\lfloor\frac{r+1}{2}\right\rfloor$ . We have

[TABLE]

where

[TABLE]

where $\mu=1$ if $\sum_{i=0}^{t}i\frac{\beta_{i}(x)}{\sum_{\ell}\beta_{\ell}(x)}\Big{|}_{x=1}\leq(r+1)\tau$ and

[TABLE]

otherwise. Here ${\text{\rm Root}}^{+}(f(x))$ denotes the unique positive zero of the polynomial $f(x).$

Proof:

We again rely on inequality (35). To bound above the $B(\tau n)$ let us the constraints in (39). In particular, from (39a) and (39b) we obtain

[TABLE]

The number of vectors $x$ that satisfy (39a)-(39b) is therefore given by the coefficient of $x^{k}$ in the following expression:

[TABLE]

Now let us in addition use (39c). Since $\operatorname{wt}(x)\leq\operatorname{wt}(x+v_{i})$ for all $i$ and they are in the same coset, it suffices to count only one of them on the right-hand side of (43). Therefore if $r+1$ is even, we obtain

[TABLE]

where

[TABLE]

One can see that asymptotically for $n\to\infty$ the dominating term in this expression is given by the largest coefficient, i.e.,

[TABLE]

Thus we have

[TABLE]

which translates into the following convex maximization problem:

[TABLE]

where the distribution $P=(r+1)(\alpha_{0},\dots,\alpha_{t})$ satisfies the constraints

[TABLE]

The maximum is found by differentiation (setting up a Lagrange function) and we obtain

[TABLE]

for $\beta_{i}(x)$ and $\mu$ as defined in (41),(42). Substituting these values of $\alpha_{i},$ we find the bound in the form given in (40). ∎

A numerical evaluation of the improvements in the asymptotic rate of this results over the previous existing results is shown in Fig. 1. The improvement can be seen for larger values of the relative distance. For instance, for $r=2$ the improvement is obtained for $\delta\geq 0.38,$ and this range increases for larger values of $r$ .

Acknoledgment. We are grateful to the reviewers for insightful remarks that helped us to improve the presentation of the paper. In particular, Lemma 10 was suggested by a reviewer.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” IEEE Trans. Inform. Theory , vol. 58, no. 11, pp. 6925–6934, 2011.
2[2] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Optimal linear codes with a local-error-correction property,” in Proc. 2012 IEEE Internat. Sympos. Inform. Theory , 2012, pp. 2776–2780.
3[3] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with local regeneration,” in Proc. IEEE Int. Symp. Inform. Theory, Istanbul, Turkey, Jul. 2013 , pp. 1606–1610.
4[4] L. Pamies-Juarez, H. D. L. Hollmann, and F. E. Oggier, “Locally repairable codes with multiple repair alternatives,” in Proc. 2013 IEEE Int. Sympos. Inform. Theory , pp. 892–896.
5[5] A. Wang and Z. Zhang, “Repair locality with multiple erasure tolerance,” IEEE Trans. Inform. Theory , vol. 60, no. 11, pp. 6979–6987, Nov 2014.
6[6] N. Prakash, V. Lalitha, and P. Kumar, “Codes with locality for two erasures,” in Proc. 2014 IEEE Int. Sympos. Inform. Theory, Honolulu, HI , pp. 1962–1966.
7[7] A. S. Rawat, A. Mazumdar, and S. Vishwanath, “Cooperative local repair in distributed storage,” EURASIP Journal on Advances in Signal Processing , 2015, 17pp.
8[8] A. Mazumdar, “Storage capacity of repairable networks,” IEEE Transactions on Information Theory , vol. 61, no. 11, pp. 5810–5821, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Abstract

I Introduction

Definition 1

Definition 2

II New Bounds on LRC Codes

Theorem 1

Proof:

Lemma 2

Proof:

Corollary 3

Remark 1

Theorem 4** ([13])**

Corollary 5

Proof:

III Algebraic Combinatorics of LRC Codes and LP Bounds

III-A Association Schemes and their Powers

III-A1 Metric Association Schemes

III-A2 Products of Association Schemes

III-A3 The Linear Programming Bound

III-A4 The Hamming Scheme

III-B The Linear Programming Bound for LRC Codes with Disjoint Repair Groups

III-B1 General Bound

Theorem 6** (Primal LP bound)**

Theorem 7** (Dual LP bound)**

Corollary 8

III-B2 The Singleton Bound

III-B3 Bounds for (n,k,r,2)(n,k,r,2)(n,k,r,2) LRC Codes

IV Asymptotic Bounds for Binary Linear LRC Codes

Theorem 9** ([18, 17])**

Remark 2

Lemma 10

Proof:

Theorem 11** (Linear LRC codes)**

Lemma 12** ([24])**

Theorem 13

Proof:

Remark 3

IV-A The Approach of Iceland and Samorodnitsky [24] to Bounds on LDPC Codes

Theorem 14** ([34])**

Proposition 15** ([24])**

Proof:

Lemma 16** ([24, Lemma 2])**

IV-B Upper Bound on LRC Codes with Disjoint Repair Groups

Theorem 17** (Linear LRC codes with disjoint repair groups)**

Proof:

Theorem 4 ([13])

Theorem 6 (Primal LP bound)

Theorem 7 (Dual LP bound)

III-B3 Bounds for $(n,k,r,2)$ LRC Codes

Theorem 9 ([18, 17])

Theorem 11 (Linear LRC codes)

Lemma 12 ([24])

Theorem 14 ([34])

Proposition 15 ([24])

Lemma 16 ([24, Lemma 2])

Theorem 17 (Linear LRC codes with disjoint repair groups)