Lifted multiplicity codes and the disjoint repair group property
Ray Li, Mary Wootters

TL;DR
This paper introduces lifted multiplicity codes, a generalization of lifted Reed Solomon codes, demonstrating they achieve superior redundancy-locality trade-offs and disjoint repair group properties, advancing error correction code efficiency.
Contribution
It presents lifted multiplicity codes with improved redundancy and locality trade-offs, and provides a new analysis of lifted Reed Solomon codes via dual codes.
Findings
Lifted multiplicity codes achieve redundancy $O(t^{0.585} \, \sqrt{N})$ with disjoint repair groups.
They offer the best known trade-off for redundancy and locality for super-constant $t < \sqrt{N}$.
Alternative analysis of lifted Reed Solomon codes using dual codes is provided.
Abstract
Lifted Reed Solomon Codes (Guo, Kopparty, Sudan 2013) were introduced in the context of locally correctable and testable codes. They are multivariate polynomials whose restriction to any line is a codeword of a Reed-Solomon code. We consider a generalization of their construction, which we call lifted multiplicity codes. These are multivariate polynomial codes whose restriction to any line is a codeword of a multiplicity code (Kopparty, Saraf, Yekhanin 2014). We show that lifted multiplicity codes have a better trade-off between redundancy and a notion of locality called the -disjoint-repair-group property than previously known constructions. More precisely, we show that lifted multiplicity codes with length and redundancy have the property that any symbol of a codeword can be reconstructed in different ways, each using a disjoint subset of the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Lifted multiplicity codes and the disjoint repair group property††thanks: A conference version of this paper appeared at RANDOM ’19.
Ray Li and Mary Wootters Department of Computer Science, Stanford University. Research supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE - 1656518.Departments of Computer Science and Electrical Engineering, Stanford University. This work is partially supported by NSF grants CCF-1657049 and CCF-1844628.
Abstract
Lifted Reed-Solomon Codes (Guo, Kopparty, Sudan 2013) were introduced in the context of locally correctable and testable codes. They are multivariate polynomials whose restriction to any line is a codeword of a Reed-Solomon code. We consider a generalization of their construction, which we call lifted multiplicity codes. These are multivariate polynomial codes whose restriction to any line is a codeword of a multiplicity code (Kopparty, Saraf, Yekhanin 2014). We show that lifted multiplicity codes have a better trade-off between redundancy and a notion of locality called the -disjoint-repair-group property than previously known constructions. As a corollary, they also give better tradeoffs for PIR codes in the same parameter regimes. More precisely, we show that, for , lifted multiplicity codes with length and redundancy have the property that any symbol of a codeword can be reconstructed in different ways, each using a disjoint subset of the other coordinates. This gives the best known trade-off for this problem for any super-constant . We also give an alternative analysis of lifted Reed-Solomon codes using dual codes, which may be of independent interest.
1 Introduction
In this work we study lifted multiplicity codes, and show how they provide improved constructions of codes with the -disjoint repair group property (-DRGP), a notion of locality in error correcting codes.
An *error correcting code *of length over an alphabet is a set . There are several desirable properties in error correcting codes, and in this paper we study the trade-off between two of them. The first is the size of , which we would like to be as big as possible given . The second desirable property is *locality. *Informally, a code exhibits locality if, given (noisy) access to , one can learn the ’th symbol of in sublinear time. As we discuss more below, locality arises in a number of areas, from distributed storage to complexity theory.
Two constructions of codes with locality are lifted codes [GKS13] and multiplicity codes [KSY14]; in fact, both of these constructions were among the first known high-rate Locally Correctable Codes. In this work, we consider a combination of the two ideas in *lifted multiplicity codes, *and we show that these codes exhibit locality beyond what’s known for either lifted codes or for multiplicity codes.
More precisely, we study a particular notion of locality called the *-disjoint-repair-group property *(-DRGP). Informally, we say that has the -DRGP if any symbol of can be obtained in different ways, each of which involves a disjoint set of coordinates of . Formally, we have the following definition.
Definition 1.1**.**
A code has the -disjoint repair property if for every , there is a collection of disjoint subsets , and functions so that for all and for all , . The sets are called repair groups.
As discussed more in Section 1.1 below, the -DRGP naturally interpolates between many different notions of locality. The -DRGP is well-studied both when is small (where it is related to Locally Repairable Codes and nearly equivalently to Private Information Retrieval Codes) and is large (where it is equivalent to Locally Correctable Codes). For this reason, it is natural to study the -DRGP when is intermediate; for example, when for . In this case, it is possible for the size of the code to be quite large: more precisely, it is possible for the rate to approach (notice that we always have , hence we always have ). Thus, the goal is to understand exactly how quickly the rate can approach . That is, given , how small can the redundancy be?
Several works have tackled this question, and we illustrate previous results in Figure 1. Our main result is that lifted multiplicity codes improve on the best-known trade-offs for all super-constant .
Contributions.
We summarize the main contributions of this work below.
For , we construct codes with the -DRGP and redundancy at most
[TABLE]
This gives the best known construction for all with and ; the only previous result that held non-trivially for a range of was redundancy [FVY15, BE16, AY19] and our result also surpasses the specialized bound for of [FGW17].
We note, however, that our construction has a large alphabet size, . In contrast, the works [GKS13, FVY15, FGW17] have alphabet size at most polynomial in . However, we can follow the approach of [AY19] and make our code binary by replacing each symbol with an (uncoded) binary string. This yields *binary *codes with -DRGP, that have the best known trade-offs between and the redundancy when among all known codes with alphabet size . 2. 2.
We give a new analysis of bivariate lifts of multiplicity codes. Both multiplicity codes and lifted codes have been studied before (even in the context of the -DRGP), but to the best of our knowledge the only work to consider lifted multiplicity codes is [Wu15]. That work studies -variate lifts of multiplicity codes, where is large; its goal is to obtain new constructions of high-rate locally correctable codes. In the context of our discussion, this corresponds to the -DRGP when . In contrast, for bivariate lifts, we are able to obtain more refined bounds which lead to improved results for the -DRGP when .
Organization.
In the remainder of the introduction, we survey related work and give an overview of our approach. In Section 2, we give the formal definitions about polynomials and derivatives that we need. In Section 3, we formally define lifted multiplicity codes. In Section 4, we prove that lifted multiplicity codes have high rate, and in Section 5, we prove that they have the -DRGP, which gives rise to our main theorem, Theorem 1.2.
1.1 Background and Related Work
1.1.1 Disjoint Repair Groups
The -DRGP and related notions have been studied both implicitly and explicitly across several communities. When is small, several notions related to the -DRGP have been studied, motivated primarily by distributed storage. These include Locally Repairable Codes (LRCs) with availability [WZ14, RPDV14, TB14, TBF16], codes for Private Information Retrieval (PIR) [FVY15, BE16, AY19] (all codes with the -DRGP are -PIR codes) and batch codes [IKOS04, RSDG16, AY19]; we refer the reader to [Ska18] for a survey of these notions.111In many (but not all) of these notions, we also care about the size of the repair groups but in this work we focus on the simpler problem of the -DRGP.
To see why the -DRGP might be relevant for distributed storage, consider a setting where some data is encoded as , and then each is sent to a separate server. If server is later unavailable, we might want to reconstruct without contacting too many other servers. This can be done if each symbol has one small repair group; this is the defining property of LRCs. Now suppose that several (say, ) servers are unavailable. If has the -DRGP then all unavailable symbols can be locally reconstructed: each node has at least disjoint repair groups and at most of them have been compromised.
On the other hand, when is large, the -DRGP has been studied in the context of Locally Decodable Codes and Locally Correctable Codes (LDCs/LCCs). In fact, the -DRGP is equivalent to a constant-query LCC, and the notion has been used to prove impossibility results for such codes [KT00, Woo10].
Because of these motivations, there are several constructions of -DRGP codes for a wide range of ; we illustrate the relevant ones in Figure 1. In the context of coded PIR, [FVY15, BE16, AY19] give constructions of -DRGP codes with redundancy . This is known to be tight for [RV16, Woo16], but no better lower bound is known.222When the size of the repair groups is bounded, it is known that the redundancy must be at least [TBF16]. When is very large, constructing codes with the -DRGP is equivalent to constructing constant-query LCCs, and it is known that the rate of the code must tend to zero [Woo10]. On the other hand, for any , when is just slightly smaller, then work on high-rate LCCs [KSY14, GKS13, HOW15, KMRS16] (see also [AY19]) imply that there are codes with rate (or any constant less than ) with the -DRGP.333In fact we may even take slightly sub-constant using the construction of [KMRS16].
When , there are a few constructions known that beat the bound mentioned above, including difference-set codes (see, e.g., [LC01]) and, relevant for us, lifted parity-check codes [GKS13]. These constructions achieve redundancy when . In Appendix B, we include a new proof of the fact that the lifted codes of [GKS13] have this redundancy using a dual view of lifted codes.
When , there is only one construction known which beats the bound, due to [FGW17]. For the special case of , they give a construction based on “partially lifted codes” which has redundancy .
1.1.2 Lifting and multiplicity codes
Lifted multiplicity codes are based on lifted codes and multiplicity codes, both of which have a long history in the study of locality in error correcting codes.
Lifted Codes.
Lifting was introduced by Guo, Kopparty and Sudan in [GKS13]. The basic idea can be illustrated by Reed-Solomon (RS) codes. An RS code of degree over is the code
[TABLE]
where are the elements of . There is a natural multi-variate version of RS codes, known as Reed-Muller codes:
[TABLE]
where are the elements of . Reed-Muller codes have a very nice locality property, which is that the restriction of a RM codeword to a line in yields an RS codeword. This fact has been taken advantage of extensively in applications like local decoding, local list-decoding and property testing. However, RM codes have a downside, which is that if (required for the above property to kick in), they have very low rate. With this inspiration, we could ask for the set which contains evaluations of all -variate polynomials which restrict to low-degree univariate polynomials on every line. Surprisingly, [GKS13] showed that this set can be much larger than the corresponding RM code! This code is called a *lifted *Reed-Solomon code, and the main structural result of [GKS13] is that is the span of the monomials whose restrictions to lines are low-degree. This property is key when analyzing the rate of these codes. Moreover [GKS13] showed that this is the case when we begin with *any *affine-invariant code, not just RS codes.
The original motivation for lifted codes was to construct LCCs, but [GKS13] actually also give a code with the -DRGP, mentioned above; we give an alternate proof that this construction has the -DRGP in Appendix B. A variant of lifting was also used in [FGW17] to construct -DRGP codes; however, the analysis of this construction is quite brittle and seems difficult to extend to non-trivial constructions for .
Multiplicity Codes.
Multiplicity codes were introduced by Kopparty, Saraf and Yekhanin [KSY14] with the goal of constructing high-rate LCCs. The basic idea of multiplicity codes is to get around the low rate of RM codes discussed above in a different way, by appending derivative information to allow for higher-degree polynomials. That is, it is not useful to have an RS code with degree , since for any . However, if we replace the single evaluation with a vector of evaluations , where denotes the ’th derivative, then it does make sense to take . The -variate multiplicity code of degree and order over is then defined similarly to :
[TABLE]
where is a vector containing all of the partial derivatives of of order less than , evaluated at . Since their introduction, multiplicity codes have found several uses beyond LCCs, including list-decoding [Kop15a, GW13], and have even been used to explicitly construct codes with the -DRGP [AY19].
Lifted Multiplicity Codes.
To the best of our knowledge, the only work to study lifted multiplicity codes is the work of Wu [Wu15]. The goal of that work is to obtain versions of multiplicity codes which are still high-rate LCCs but which require lower-order derivatives than the construction of [KSY14]. The main result in [Wu15] is that lifted multiplicity codes of rate are LCCs with locality (this corresponds roughly to having the -DRGP with ). However, since the number of variables in the lift is large, it is hard to get a very precise handle on the codimension.
In comparison, in our work, we focus on the -DRGP for , but where our goal is to get much tighter bound on the codimension of the code. We address the quantitative comparison between our bound on the rate and that obtainable by the techniques of [Wu15] in Remark 4.5.
We note that the construction in [Wu15] is similar to the construction presented here. Since this construction is somewhat non-trivial (for reasons discussed below), we include the details.
Why only bivariate lifts?
In contrast to [Wu15], we study *bivariate *lifts of multiplicity codes. By focusing only on bivariate lifts (as was also done in [FGW17]), we obtain a more precise handle on the codimension of lifted multiplicity codes, which gives results for the -DRGP for . (See Remark 4.6 for more on why bivariate lifts make it much easier to analyze the codimension.) We believe that this wide range of is interesting, and thus we think that bivariate lifts are worth focusing on.
We expect that lifted multiplicity codes can be analyzed over more variables. However, we expect that this will not improve the tradeoff between the redundancy and (the number of repair groups) for the setting . Indeed, this tradeoff becomes worse for ordinary multiplicity codes [AY19]: for these codes, a larger number of variables yields better bounds only for larger values of . In general, -variable lifted multiplicity codes can have up to disjoint repair groups, so variables are needed for repair groups. For , we expect that the number of variables that gives the best rate for lifted multiplicity codes is . We leave the analysis for more variables this for future work (see Section 6).
1.2 Our approach
We study lifted multiplicity codes to obtain improved constructions of codes with the -DRGP. We focus on bivariate lifts in this paper in order to obtain codes with -DRGP for . We expect that lifted multiplicity codes in more than two variables also give better codes for the -DRGP when .
1.2.1 Definition of lifted multiplicity codes
It is not immediately obvious how to apply lifting (and in particular, the nice characterization of it developed in [GKS13] as the span of “good” monomials) to univariate multiplicity codes. We first note that the univariate multiplicity code does not fit the affine-invariant framework of [GKS13], so their results do not immediately apply. Instead, we might try to define the bivariate lift of as the set of vectors for all polynomials so that every restriction of to a line agrees with some polynomial of degree less than on its first derivatives; that is, the restriction of is *equivalent up to order *to a polynomial of degree less than . This works, but there are two non-trivial things to deal with.
First, in order to get a handle on the rate of the code, as in [GKS13] we show that the set of valid polynomials includes the span of a large set of “good” monomials. In contrast to [GKS13], the good monomials in this work do not span the entire code. However, lower bounding the number of good monomials, which in turns gives a lower bound on the rate of the code, turns out to be enough for our results. 2. 2.
Second, we need to take some care about what monomials we allow. With lifted RS codes, one only allows monomials with individual degrees ; otherwise, we could have multiple monomials which correspond to the same codeword which leads to problems if we are counting monomials in order to understand the dimension of the code. As we show in Lemma 3.5, it turns out that with multiplicity codes, we should only allow monomials with ; otherwise, we would have multiple monomials the correspond to the same codeword and this would create similar problems.
Dealing with these issues leads us to the final code and rate analysis, where we define the lifted multiplicity code to be all polynomials spanned by monomials with , such that the restriction of the polynomial to a line is equivalent up to order to some univariate polynomial of degree less than . We then lower bound the number of evaluations of monomials in this code, giving a lower bound on the rate. We note that the work [Wu15] considers a similar construction.
1.2.2 Lifted multiplicity codes have the -DRGP
In Corollary 4.3 we give a lower bound on the number of -good monomials, and this leads to a lower bound on the dimension of the lifted multiplicity code; crucially, this can be quite a bit bigger than the dimension of the corresponding multivariate multiplicity code.
Finally, we observe that lifted multiplicity codes have the -DRGP for a range of values of . Similarly to previous constructions based on multivariate polynomial codes, the disjoint repair groups to recover the symbol are given by disjoint collections of lines through . More precisely, the values for the set of that lie on distinct lines through can be used to recover . Thus, the number of disjoint repair groups is . By adjusting , we obtain the trade-off shown in Figure 1. Our main theorem is as follows.
Theorem 1.2**.**
For and with , there exists a code over with the following properties.
- •
The length of the code is .
- •
The rate of the code is at least
[TABLE]
so that the redundancy is at most
[TABLE]
- •
The code has the -disjoint repair group property.
As a remark, our techniques can also recover any symbol from any one of its repair groups in polynomial time. For any , choosing and with gives a code with length and redundancy at most
[TABLE]
with the -DRGP. This is made formal in the following corollary.
Corollary 1.3**.**
For any , there are infinitely many so that, for , there exists a code of length which has the -DRGP and redundancy at most
We note that Theorem 1.2 also yields results for constant , not just for as presented in Corollary 1.3. For example, by setting we obtain a code with the -DRGP and redundancy at most . The constant is not optimal here (the optimal constant for is known to be [RV16]), but to the best of our knowledge, Theorem 1.2 does yield the best known bounds for any super-constant .
The codes in Theorem 1.2 and Corollary 1.3 have the disadvantage of having a large alphabet size. Indeed, we have , and so the alphabet size is , which is very large. It is an interesting question to obtain the results of Corollary 1.3 with a code over a smaller alphabet (see open questions in Section 6). Among the existing work in Figure 1, [GKS13, FVY15, FGW17] all have or smaller sized alphabets.
For now, we observe as in [AY19] that, if is a code with the -DRGP, then replacing with a binary code , where each symbol in each codeword is replaced with binary bits, yields a code that also has the -DRGP. As a result, applying this to the code in Corollary 1.3 yields a code with length and redundancy .
Corollary 1.4**.**
For any , there are infinitely many so that, for , there exists a binary code of length which has the -DRGP and redundancy at most .
Among codes with alphabet size or smaller, our binary codes give the best known tradeoff between and redundancy when (at [FGW17] gives a better redundancy).
2 Preliminaries
In this section, we introduce the background we need on polynomials and derivatives over finite fields. Throughout this paper, we assume that is a power of 2. Let denote the finite field of order , and let denote its multiplicative subgroup.
If and are nonnegative integers with binary representations and , then we write if for . If is an integer, let denote the element of congruent to mod . We write if .
As in [GKS13], we use Lucas’s theorem.
Proposition 2.1** (Lucas’s theorem).**
Let be a prime and be written in base . Then
[TABLE]
In particular, if , then if and only if .
2.1 Polynomials and derivatives
For a vector of nonnegative integers, its weight, denoted , equals . For a field , let be the ring of polynomials in the variables with coefficients in . For a vector of nonnegative integers and a vector of variables, let denote the monomial , and for a vector , let denote the value , where . For nonnegative vectors and , we write if for all . We also write to denote . For nonnegative vector i, we let denote the coefficient of in the polynomial .
We will use Hasse derivatives, a notion of derivatives over finite fields:
Definition 2.2** (Hasse derivatives).**
For and a nonnegative vector i, the i-th (Hasse) derivative of , denoted or , is the coefficient of in the polynomial . Thus,
[TABLE]
For and , we use the notation to denote the vector containing for all so that . We record a few useful (well-known) properties of Hasse derivatives below (see [HKT08]).
Proposition 2.3** (Properties of Hasse derivatives).**
Let and let be vectors of nonnegative integers. Then
. 2. 2.
. 3. 3.
.
Using the above, we obtain the following useful derivative computation, and we provide a proof in Appendix A for completeness.
Proposition 2.4**.**
Let with a power of 2, and let . Then,
[TABLE]
2.2 Polynomial local recovery
A key property exploited by earlier work on multiplicity codes [KSY14, Kop15b] is that can be recovered from for that lie on a collection of lines through . More precisely, let be the set of lines of the form with . Given a multivariate polynomial , if is the line , let denote the univariate polynomial . Let be the set of lines in of the form for .
For simplicity—and because it is enough for our application to the -DRGP—we will consider only bivariate polynomials in this paper, although (see for example [Kop15b]) the same basic idea works for any . We will further specialize to lines in —that is, lines of the form —because it will simplify some computations later in the paper. With these restrictions, we can specialize Equation (4) of [Kop15b] to obtain the following relationship between the derivatives of and the derivatives of .
Lemma 2.5** (Follows from, e.g., [KSY14, Kop15b]).**
Suppose that are lines in all passing through a point , with being the line . Then, for all polynomials , the following matrix equality holds for all .
[TABLE]
When lines are distinct, the middle matrix in (4) is a Vandermonde matrix, and Vandermonde matrices are invertible in polynomial time. Hence, we immediately have the following corollary.
Corollary 2.6**.**
Suppose that are distinct lines of the form all passing through a point . For a polynomial , given the polynomials , the derivatives are uniquely determined and computable efficiently for all i such that .
3 Lifted multiplicity codes
In this section, we define lifted multiplicity codes. As noted in the introduction, we restrict our attention to bivariate codes because this is enough for our application to the -DRGP. However, everything in this section extends to general -variate codes. We define bivariate lifted multiplicity codes as the vectors for polynomials that live in the span of “good” monomials. In order to define these “good” monomials, we need a few more definitions.
3.1 Polynomial equivalence
We first define a notion of polynomial equivalence.
Definition 3.1**.**
We say that two univariate polynomials are equivalent up to order , written , if for all and .
It is easy to see that the above definition does in fact give an equivalence relation. We now present two standard results regarding this equivalence relation. The first is a characterization of this equivalence.
Lemma 3.2**.**
For we have if and only if .
Proof.
By considering the polynomial , it suffices to prove is equivalent to the zero polynomial up to order if and only if . If for some polynomial , then, by part 2 of Proposition 2.3 and Proposition 2.4, for , we have , so for all and all , so .
Conversely, suppose that . By the definition of Hasse derivatives, we have . Since for , we have . Thus is true for all , so , so . ∎
Lemma 3.2 gives the following corollary.
Lemma 3.3**.**
Let be a power of 2 and . For every univariate polynomial , there exists a unique degree-at-most polynomial such that . Furthermore, if is a power of 2, then for all such that , we have .
Proof.
For existence of , note that, by Lemma 3.2, we can take to be the remainder when is divided by . For uniqueness of , suppose that and are equivalent to up to order and are of degree at most . By Lemma 3.2, we have . Additionally, has degree at most , so .
Now suppose is a power of 2. Then . Above, to obtain from , we need only to subtract terms of the form . Thus, for such that , the coefficients of in and are equal. ∎
3.2 Type- polynomials
Define the order- evaluation map by
[TABLE]
We will want to restrict our attention to a subset of monomials whose order- evaluations form a basis for the space . To that end, we introduce the following definition.
Definition 3.4** (Type- monomials).**
Call a monomial type- if . Let be the family of polynomials that are spanned by type- monomials.
It is easy to see that is a dimension vector space over . We now show that the type- polynomials form a basis for bivariate polynomials, up to order equivalence. We note that Lemma III.1 of [Wu15] claims a similar statement, with a different argument.
Lemma 3.5**.**
The evaluation map is a bijection.
Proof of Lemma 3.5.
Since is a linear map and and have the same dimension, it suffices to prove the map has trivial kernel. We prove by induction.
Base Case: . Suppose and is the 0-vector. Then for all . For any , the polynomial has degree at most but has roots, so the polynomial must be 0. Hence, for all , so , which implies . This proves that has trivial kernel.
Inductive step: Assume and has trivial kernel. We prove that has trivial kernel.
Assume is a polynomial spanned by type- monomials with all ith derivatives equal to 0 for . Let and . Then, for , we have for all . Hence, for all , we have . Hence, . Since for all , we have . Thus, is the 0 polynomial for all , so for all , so . Hence, we may write for some polynomial .
As polynomial is type-, polynomial is type-: if had a nonzero coefficient for with , then the coefficient is nonzero in , which is a contradiction. For all with and , we have
[TABLE]
Here we applied part 2 of Proposition 2.3 and the case of Proposition 2.4. At every and , the left side is 0 by assumption on and the right side . We conclude that evaluates to 0 everywhere for every nonnegative and satisfying . Since is type-, we have by the induction hypothesis, so . This completes the induction, completing the proof. ∎
3.3 Definition of lifted multiplicity codes
Finally we are ready to define lifted multiplicity codes, which we define as the set of evaluations of polynomials whose restrictions to lines444To simplify calculations, we consider restrictions to lines of the form . That is, we do not include lines of the form . are equivalent, up to order , to a low degree polynomial:
Definition 3.6** (Lifted multiplicity codes, first definition).**
The (bivariate) lifted multiplicity code is a code over alphabet of length given by
[TABLE]
Definition 3.6 is natural but difficult to get a handle on directly. Following the approach of previous work [GKS13, FGW17], we show that lifted multiplicity code contains the set of vectors for that lie in the span of a set of “good” monomials, which makes it easier to bound the rate. Informally, a monomial is -good if its restriction along every line is equivalent, up to order , to a polynomial of degree less than .
Definition 3.7** (-good monomials).**
Call a monomial -good (or simply good, when and are understood) if it is type- and for every line , the univariate polynomial is equivalent, up to order , to polynomial of degree less than , and call it -bad otherwise.
By definition all good monomials lie in our lifted multiplicity code, so to lower bound the rate of the code it suffices to lower bound the number of good monomials.
Lemma 3.8**.**
Let be the bivariate lifted multiplicity code. Then, for every -good monomial , , and the rate of is at least \frac{\#\text{(q,r,d)-good monomials}}{\binom{r+1}{2}q^{2}}.
Proof.
The first part follows from the definition of good monomial. For the second part, is linear and the -span of all good monomials have pairwise distinct evaluations by Lemma 3.5, so |\mathcal{C}|\geq q^{(\#\text{(q,r,d)-good monomials})}. As is a length code over an alphabet of size , the rate is at least \frac{\log|\mathcal{C}|}{q^{2}\log|\Sigma|}=\frac{\#\text{(q,r,d)-good monomials}}{\binom{r+1}{2}q^{2}}. ∎
Remark 3.9**.**
A previous version of this paper incorrectly asserted that every codeword of the lifted multiplicity code is spanned by good monomials. As observed by Nikita Polianskii, this is in fact not true. For example, when and , the monomials and are not -good as verified by the line , but their sum is in the -lifted multiplicity code: the restriction of the sum to a line has a coefficient of and hence has degree strictly less than .
4 The rate of lifted multiplicity codes
In this section, we bound the rate (and hence, the redundancy) of lifted multiplicity codes. Our final result on the rate is Corollary 4.3 below, which implies that for and of an appropriate form, the lifted multiplicity code over order and degree over has rate at least
[TABLE]
In the next section, we choose , which will yield a code of rate and will give us Theorem 1.2.
Before we prove this result, we briefly compare our approach to more straightforward ones, and discuss why we are able to do better.
First, we discuss what might be a first strategy building on the analysis of [GKS13] for lifted Reed-Solomon codes. Similarly to that work, we want to show there are few bad monomials. We can show (after checking some conditions) that a monomial is bad if, restricted to some line, in the resulting univariate polynomial, one of the coefficients of is nonzero. This corresponds to the analysis of lifted Reed-Solomon codes when . For each , similar to the analysis of the lifted Reed-Solomon code, we can bound the number of monomials that could cause the coefficient of to be nonzero by . Using the union bound and summing these bounds gives a bound on the number of bad monomials for the lifted multiplicity code. However, when (the setting we will consider), this gives a rate of . Thus, this yields a code with the same redundancy of as the lifted Reed-Solomon code, and we have made no improvement.
In order to do better, the key to our analysis is to observe that monomials that are bad for some are likely to be bad for another , so the union bound is wasteful. Instead, using some tricks with binary arithmetic (captured in Lemma 4.1), we are able to analyze together all the monomials that make any of the coefficients of nonzero, giving a better bound.
Second, we compare our approach to the analysis of [Wu15], which also studies lifted multiplicity codes, but focuses on a different parameter regime (one where is much larger). As described more in Remark 4.5, the approach of [Wu15] does not yield anything better in the parameter regime that we consider () than does the approach described above (or indeed even any better than standard (not lifted) multiplicity codes when ). The reason that we are able to do better than the straightforward argument above while the approach of [Wu15] does not is that [Wu15] uses a stricter requirement for a monomial to be good in [Wu15, Lemma III.3] than we do in our Lemma 4.2. Thus, the approach of [Wu15] counts a smaller number of good monomials and ends up with a weaker bound on the rate.
Now, we prove our result. We begin with a lemma that will be useful.
Lemma 4.1**.**
Let and with . The number of such that at least one of the following is true
[TABLE]
is at most .
Proof.
Suppose we write the numbers in binary with digits (possibly with leading zeros). As these number span consecutive integers mod , when written in this binary form, their most significant coordinates take on at most 2 values. Let and so that , and and are the most significant coordinates of and , respectively, when written in -digit binary. Then if one of the equations of (7) is true, then we must have either or . This gives at most choices for the pair . Given and , there are choices for each of and , for a total of at most solutions to (7). ∎
Lemma 4.2**.**
Let , and with . The number of -good monomials is at least .
Proof.
The number of type- monomials is . A monomial is -good if, for every , we have
[TABLE]
can be represented as a polynomial of degree less than . Next, we apply Lemma 3.3, which says that there is a unique polynomial so that so that , and further that all of the coefficients for are equal to the corresponding coefficient of . As is type , we have , so the degree of the polynomial is at most , and
[TABLE]
for any allowed choice of , so for all so that
[TABLE]
Thus, to show that has degree less than , it suffices to show that the coefficients of in are all zero.
Write and where and . Note that if , then for coefficient is always zero except possibly when and . This can happen for at most pairs . Hence, for , there are bad monomials .
Now assume . For , the coefficient of in is 0 if or . Otherwise, the coefficient is
[TABLE]
By Proposition 2.1, the binomial coefficient is nonzero (mod 2) if and only if , which, as is a power of 2, happens only if . Hence, if , the monomial is -bad only if some satisfies . Hence, by Lemma 4.1, for a fixed with , there are at most bad monomials , so there are at most bad monomials over all with . As we showed, there are at most bad monomials when . Hence, there are at least good monomials, as desired. ∎
Lemma 4.2 and Lemma 3.8 together imply Corollary 4.3, which in turn implies the informal result stated at the beginning of the section.
Corollary 4.3**.**
Let , and with . A lifted multiplicity code has rate at least .
Remark 4.4**.**
We apply Corollary 4.3 for , giving that a lifted multiplicity code of rate at least . By comparison [KSY14], a 2-variate multiplicity code of order evaluations of degree at most polynomials over has rate , which is smaller than the rate of lifted multiplicity codes for .
Remark 4.5** (Quantitative comparison to [Wu15]).**
The work [Wu15] also studies lifted multiplicity codes, but focuses on a different parameter regime than we focus on here (where is large, rather than ). Perhaps because they focus on a different parameter regime, the approach of [Wu15] does not yield any nontrivial results in our parameter regime, and consequently our analysis of lifted multiplicity codes is much stronger.
For example, for degree codes, [Wu15] bounds555 The details are as follows: using some notation from [Wu15], for a rate code in our parameter regime, variables, a prime , a parameter , , (they use for , for , and for ), (here , and we assume ), , . In our setting, we choose , which requires , so , so . Thus, .
the rate of the code below by . This is a weaker bound than the straightforward bound of sketched at the beginning of this section, and significantly weaker than our bound in Corollary 4.3 of for all and . Moreover, for , the bound of [Wu15] is even weaker than the lower bound on the rate of (non-lifted) multiplicity codes, which is .
Remark 4.6** (The value of bivariate lifts).**
In addition to likely giving better bounds than -variate lifts (see Why only bivariate lifts? in Section 1.1), another reason that we study only bivariate lifts in this paper is that it makes the computations much more tractable. In the proof of Lemma 4.2, we study , and expand out the terms to apply Lucas’s theorem. If we were to consider, say, trivariate lifts, we would have to expand expressions of the form , and it would become more complicated to keep track of the coefficients on various powers. Analyzing -variate lifts would become more complicated still. In particular, it seems harder to get as tight a bound on the codimension of the code for -variate lifts for as we are able to get for . Given that we are already able to obtain good codes for bivariate lifts, we restrict our attention to this simpler case.
5 Disjoint repair groups of lifted multiplicity codes
Finally, we prove Theorem 1.2, which we repeat below.
Theorem** (Theorem 1.2, restated).**
Let and with and be the lifted multiplicity code.
- •
The length of the code is .
- •
The rate of the code is at least .
- •
The code has the -disjoint repair group property.
Proof.
The first item follows from the definition of , and the second item is by Corollary 4.3. To see the third item, we observe that, given a point , lines passing through , and at all points on the lines except itself, we can (efficiently) recover . This guarantees the -disjoint repair group property, because we can group the lines of of the form passing through arbitrarily into groups of , giving disjoint repair groups. For any line , the polynomial has degree at most , as is -good. By taking linear combinations of directional derivatives (Lemma 2.5), we can efficiently compute for every , every , and every . We can compute using a generalization of polynomial interpolation. This can be done in time, where is the degree of the polynomial (see e.g. [Chi76]) Hence, by Corollary 2.6, from , we can efficiently compute for all with . ∎
6 Conclusion
We conclude with some open questions.
We have shown that lifted multiplicity codes with redundancy have the -DRGP for a range of . However, we do not know of any general lower bounds when beyond the lower bound for , which implies that the redundancy must be at least for any . When , there is a stronger redundancy lower bound of , which holds simply because a code with the -DRGP must have Hamming distance at least . Thus, it is an open question whether or not our bound is tight or whether one can do better. 2. 2.
Lifted multiplicity codes display better locality for the -DRGP problem for ; it is a natural question to ask whether they can be used for larger , and in particular whether they could lead to improved constructions of locally correctable codes. In particular, it would be interesting if lifted multiplicity codes could qualitatively out-perform (un-lifted) multiplicity codes as high-rate LCCs, for example by maintaining the high rate while achieving sub-polynomial query complexity.666As noted in the introduction, the work [Wu15] showed the lifted multiplicity codes are good LCCs with lower-order derivatives than were required by the (un-lifted) multiplicity codes of [KSY14], but it does not show how to improve the query complexity to sub-polynomial. We note that for the LCC problem, one typically does not care about pinning down the rate, so long as it is close to , instead focusing on the query complexity. In contrast, in this work, we have focused on pinning down the rate much more precisely. 3. 3.
Related to the above, it would be natural to understand the rate and locality of lifted multiplicity codes over more than two variables. 4. 4.
The alphabet size of lifted multiplicity codes is , which, if the multiplicity is for a constant , is exponential type in the code length . In practical applications, a smaller alphabet size is desirable. It would be interesting to achieve the results of Corollary 1.3 with a code whose length grows independently of the alphabet size. 5. 5.
In this paper, we studied the -DRGP locality property, which requires that each symbol has many disjoint repair groups. Another common notion of locality is an Locally Recoverable Code (LRC) with locality , which requires that each symbol has one repair group of size at most . These two notions are combined in the notion of an LRC with locality and availability (see, e.g. [TB14, WZ14, RPDV14]), combines these two notions. This requires that each symbol have disjoint repair groups, each of size at most . The techniques in this paper yield codes with locality and availability , where is the multiplicity. It would be interesting to construct codes with a better trade-off between locality and availability, possibly using lifting and/or multiplicity techniques.
Acknowledgements
We thank Eitan Yaakobi for helpful conversations. We thank Julien Lavauzelle for pointing out the reference [Wu15], for pointing out an error in an earlier version of this paper, and for suggesting the fourth open question. We thank Nikita Polianskii for pointing out an error in an earlier version of this paper. A previous version claimed that a lifted code is exactly the span of all good monomials, but in fact the span of all good monomials only forms a subset of the lifted code (see Remark 3.9). This does not change our main result, as our lower bound on the number of good monomials still gives the same lower bound on the rate of the lifted code. We thank anonymous reviewers for helpful comments on an earlier draft of this paper.
Appendix A Proofs of polynomial facts
Proof of Proposition 2.4.
By part 2 of Proposition 2.3,
[TABLE]
We have (the field has characteristic 2). For , the th derivative of is , which is 0, as is even by Proposition 2.1. The summand above is nonzero if and only if . When , this happens when of the ’s are 1 and are 0, which happens for choices of . This gives for . When , some is at least 2, in which case for . ∎
Proof of Lemma 2.5.
Let denote the vector , and let denote the vector . By assumption, we have that . By the definition of Hasse derivatives, we have, for all
[TABLE]
Hence, for all and , we have
[TABLE]
By plugging in , we have for all and ,
[TABLE]
Rewriting this in matrix form gives the desired result. ∎
Appendix B Lifted codes via dual codes
It was shown in [GKS13] that bivariate lifted parity-check codes over , where , have co-dimension . Here, we give an alternative proof using dual codes. The techniques in this proof are not directly related to the techniques that we used in the main body of the paper, but we found this alternative proof illuminating so we include it.
Let . Recall is the set of lines expressible as where . One way to think about codes with locality is by considering their dual code. If the code is a subset of , then the dual code corresponds to lines of repair groups. Given a line in , define the corresponding dual codeword:
[TABLE]
Let
[TABLE]
Note that is spanned by elements, so the trivial bound on the dimension is . We give the following improved bound, matching the analysis of [GKS13].
Lemma B.1**.**
The subspace has dimension at most .
Proof.
A codeword is the evaluation of the following polynomial on :
[TABLE]
If , then the polynomial evaluates to 0 as , and otherwise it evaluates to
[TABLE]
For , the coefficient of in is 0. For , the coefficient of in is
[TABLE]
This is because we first chose terms that contain or , then choose which terms are and which terms are , and this gives us many ’s and many ’s, and we sum over the choices of the terms that we choose. Hence, the only such that for any are the pairs such that and . There are at most pairs by Proposition 2.1. It follows that the polynomials are spanned by monomials with . Hence, the vector space is spanned by dual codewords in and thus has dimension at most . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AY 19] Hilal Asi and Eitan Yaakobi. Nearly optimal constructions of pir and batch codes. IEEE Transactions on Information Theory , 65(2):947–964, 2019.
- 2[BE 16] Simon R. Blackburn and Tuvi Etzion. PIR Array Codes with Optimal PIR Rate. Ar Xiv e-prints , July 2016.
- 3[Chi 76] Francis Y Chin. A generalized asymptotic upper bound on fast polynomial evaluation and interpolation. SIAM Journal on Computing , 5(4):682–690, 1976.
- 4[FGW 17] S. Luna Frank-Fischer, Venkatesan Guruswami, and Mary Wootters. Locality via partially lifted codes. Co RR , abs/1704.08627, 2017.
- 5[FVY 15] Arman Fazeli, Alexander Vardy, and Eitan Yaakobi. Codes for distributed pir with low storage overhead. In 2015 IEEE International Symposium on Information Theory (ISIT) , pages 2852–2856. IEEE, 2015.
- 6[GKS 13] Alan Guo, Swastik Kopparty, and Madhu Sudan. New affine-invariant codes from lifting. In Innovations in Theoretical Computer Science, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013 , pages 529–540, 2013.
- 7[GW 13] Venkatesan Guruswami and Carol Wang. Linear-algebraic list decoding for variants of reed–solomon codes. IEEE Transactions on Information Theory , 59(6):3257–3268, 2013.
- 8[HKT 08] James W. P. Hirschfeld, Gábor Korchmáros, and Fernando Torres. Algebraic Curves over a Finite Field . Princeton Series in Applied Mathematics. Princeton University Press, 2008.
