Constructions of Batch Codes via Finite Geometry
Nikita Polyanskii, Ilya Vorobyev

TL;DR
This paper introduces new explicit and random linear primitive batch codes constructed using finite geometry, achieving lower redundancy in certain parameter regimes compared to existing codes.
Contribution
It presents novel finite geometry-based constructions of linear primitive batch codes, improving redundancy efficiency over prior methods.
Findings
Codes have lower redundancy in some parameter regimes.
Explicit and random constructions are developed.
Linear primitive batch codes are successfully constructed.
Abstract
A primitive -batch code encodes a string of length into string of length , such that each multiset of symbols from has mutually disjoint recovering sets from . We develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In some parameter regimes, our proposed codes have lower redundancy than previously known batch codes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Constructions of Batch Codes via Finite Geometry
Nikita Polyanskii1, and Ilya Vorobyev12
1Center for Computational and Data-Intensive Science and Engineering,
Skolkovo Institute of Science and Technology
Moscow, Russia 127051
2Advanced Combinatorics and Complex Networks Lab,
Moscow Institute of Physics and Technology
Dolgoprudny, Russia 141701
Emails: [email protected], [email protected]
Abstract
A primitive -batch code encodes a string of length into string of length , such that each multiset of symbols from has mutually disjoint recovering sets from . We develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In some parameter regimes, our proposed codes have lower redundancy than previously known batch codes.
Index Terms:
Private information retrieval, finite geometry, primitive batch codes
I Introduction
Batch codes were originally proposed by Ishai et al. [1] for load balancing in distributed systems, and amortizing the computational cost of private information retrieval and related cryptographic protocols. Ishai et al. gave a definition of batch codes in a general form, namely information symbols are encoded to an -tuple of strings (referred to as buckets) of total length , such that for each -tuple (batch) of distinct indices , the entries can be decoded by reading at most symbols from each bucket. The parameter is usually called availability and it plays an important role in supporting high throughput of the distributed storage system. If a batch could contain any multiset of indices (not only distinct indices), then we use the term a multiset batch code. In a special case when and each bucket contains one symbol, a multiset batch code is called primitive. This class of batch codes is the most studied one in the literature since there are several statements [1] which allow to trade between different choices of , , , and . In other words, better constructions of primitive batch codes would imply better constructions of multiset batch codes.
I-A Notation
We denote the field of size by . The symbol stands for the set of integers . Let us give a formal definition of codes studied in this paper.
Definition 1**.**
Let be a linear code of length and dimension over the field , which encodes a string to . The code will be called a primitive linear -batch code (simply, -batch code), and will be denoted by , if for every multiset of symbols , there exist mutually disjoint sets (referred to as recovering sets) such that for all , is a sum of the symbols with indices from .
Given and , we denote the minimal integer such that an code exists by . In this paper we focus on the minimal redundancy of batch codes, which we abbreviate by .
Recall that a systematic linear code is a linear code in which the input data is embedded in the encoded output, i.e., for . In what follows we are going to construct systematic linear batch codes. The following special case of recovering sets will be particularly useful.
Definition 2**.**
For a systematic linear code, we say that the recovering set for information symbol is simple if contains exactly one index greater than . In other words, if is such an index, then
[TABLE]
Note that many constructions, suggested earlier and in this paper, possess a more stronger property than one described in Definition 1 – the existence of mutually disjoint simple recovering sets.
We use the notation in a statement to demonstrate that the statement remains true for all , where is any fixed positive number. In the rest of the paper we will mainly concentrate on the case when , .
I-B Related Work
The authors of [1] provided constructions of various families of batch codes. Those constructions were based on unbalanced expanders, on recursive application of trivial batch codes, on smooth and Reed-Muller codes, and others. Many other constructions proposed later in [2, 3, 4] improve the redundancy of batch codes. In particular, a systematic linear code, defined by the generator matrix , is shown [3] to be a -batch code, where is the minimal number of ones in rows of and the bipartite graph, whose biadjacency matrix is , has no cycle of length at most . Constructions based on array codes and multiplicity codes were investigated in [2].
There is another class of related codes which is called combinatorial batch codes. For these codes the same property as for the batch codes is required, but symbols cannot be encoded. Such codes were investigated in [5, 6, 7, 8, 9]. A special case of batch codes, called switch codes, was studied in [10, 11, 12, 13]. It was suggested in [10] to use such codes to increase the parallelism of data routing in the network switches. Private information retrieval (PIR) codes can be seen as an instance of batch codes, namely we require a weaker property that every information symbol has mutually independent recovering sets. PIR codes were suggested in [14] to decrease storage overhead in PIR schemes preserving both privacy and communication complexity. Some constructions and bounds for PIR codes can be found in [15, 16, 2, 14, 17]. One-step majority-logic decodable codes [18] require a stronger property than PIR codes, namely every encoded symbol should have mutually independent recovering sets. Also we refer the reader to locally repairable codes with availability [19, 20, 21], which have an additional (with respect to PIR codes) constraint on the size of recovering sets.
Recall some known results on the minimal redundancy of batch codes:
; 2. 2.
for , [2]; 4. 4.
for , [3]; 5. 5.
for , [3]; 6. 6.
for , [2]; 7. 7.
for , where , [2].
In particular, it follows that the best known lower bound on the redundancy of batch codes is as follows
[TABLE]
I-C Our contribution
In this paper we develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In Table I our contribution (upper bounds on ) is summarized.
Let us denote . The lower bound given by (1) along with old and new upper bounds on are plotted in Figure 1. The existence result of our work shows that the known upper bound on can be improved for . Furthermore, we emphasize that the endpoints of novel explicit constructions by Theorem 3 lye on the segment given by the random construction in Theorem 1.
I-D Outline
The remainder of the paper is organized as follows. In Section II we prove the existence of batch codes using the probabilistic method. The achieved upper bound on the redundancy improves previously known results when and . We note that for and , the redundancy of our construction is worse by the multiplicative factor than one in [3]. In Section III we describe our main results and give new explicit constructions of batch codes. In a more detail, we associate information bits with elements of vector space , , and define parity-check bits as sums of information bits lying in some affine -dimensional subspaces. Finally, Section IV concludes the paper.
II Random Construction of Batch Codes
To prove the following statement, we consider a systematic linear code defined by the generator matrix , where is taken as an incidence matrix of randomly chosen family of subsets of lines in the affine plane.
Theorem 1**.**
For , the redundancy of -batch codes is
[TABLE]
Proof.
For simplicity of notation and without loss of generality, we assume that , is a prime power integer and . Consider a finite affine plane of order , where , , is a set of points, and , , is a set of lines. Each line is known to contain points, and each point is in lines, any two lines in the affine plane cross each other in at most point.
Let us randomly choose a family of subsets of lines in the affine space. First, we take each line in the affine space independently with probability , which will be specified later. Second, we define a subset of any included line by leaving each point on the line independently with probability , which will be specified later. It can be seen that for a proper choice , the cardinality of , (total number of subsets), is “close” to its average with high probability, and for a proper choice of , the cardinality of any subset is “close” to its average . We define event when the total number of lines , and if there exists some of size . Moreover, we define , , if there exists of size such that the line corresponding to subset does not contain the th point.
Now we consider some bijection between information symbols and points. Therefore, the information symbols are associated with the points in the plane. Given a subset , we can define a parity-check symbol as a sum of information symbols corresponding to points in . Let us consider a systematic linear code of length and dimension defined as a map :
[TABLE]
Given a multiset of information symbols of size , we can uniquely represent it in the form
[TABLE]
where
[TABLE]
We define a greedy algorithm for constructing a collection of recovering sets for any given multiset of information bits of size at most . Assume that the algorithm can construct simple recovering sets for the multiset
[TABLE]
representing the first groups of the multiset
[TABLE]
Then find first parity-check symbols depending on symbol , such that the corresponding simple recovering sets are disjoint with already chosen recovering sets, and lines corresponding to the parity-check symbols does not go through any point in the set
[TABLE]
Let us add these recovering sets to the collection of recovering sets. We note that added simple recovering sets are mutually disjoint by our construction.
To show that the code is likely to be a -batch code, we are going to estimate the probability of event that the greedy algorithm fails for some multiset of information symbols. To get an estimate of this event, we introduce auxiliary terminology. We say that the information symbol is -bad, , if there exists some multiset
[TABLE]
so that the algorithm finds recovering sets for the first groups of the multiset and fails to find recovering sets for . Let be an event that information symbol is -bad. If no event among occurs, then the event doesn’t happen.
We note that -batch code with redundancy at most exists if . Now we estimate this event as follows
[TABLE]
It is easy to estimate and applying the Chernoff bound in the form
[TABLE]
where , and is a sum of independent random variables taking values in with . We have
[TABLE]
and
[TABLE]
Now we estimate the third probability in (2) as follows
[TABLE]
where stands for the event that the algorithm finds recovering sets
[TABLE]
for the first groups of
[TABLE]
and denotes the event that the algorithm fails to find recovering sets for , which are disjoint with all recovering sets the algorithm found. Let , and be a set of information symbols included to recovering sets
[TABLE]
The cardinality of given the event (consequently, given the event ) is upper bounded as follows
[TABLE]
since stands for the event that all the subsets corresponding to the lines disjoint with are of size at most . The total number of lines containing is equal to . One can easily see that there are at most of them which have a nonzero intersection with . Since all the lines containing fixed point share only , we claim that there are at most lines which intersect by at least points. Indeed, otherwise we can lower bound the cardinality of by which contradicts with (6). We shall try to recover symbol with the help of other , , lines. Enumerate them from to . Let be indicator random variables, which equals 1 iff
the corresponding line was randomly taken (with probability ), 2. 2.
the symbol was left (with probability ) and included to the parity-check sum, 3. 3.
none of the symbols from were added in the corresponding parity-check.
Define the random variable
[TABLE]
Since is an independent Bernoulli random variable with probability , we claim that Binomial random variable with parameters and is stochastically dominated by . Now we proceed with upper bounding (5) as follows
[TABLE]
Combining the last inequality together with (2)-(5) yields
[TABLE]
Given , there exists sufficiently large such that for the first two terms are at most . Now we proceed with the last term
[TABLE]
Taking , we have and
[TABLE]
From this it follows that for
[TABLE]
and sufficiently large , , the last term in (7) is at most . Therefore, we obtain that there exists a -batch code with redundancy with probability at least . This completes the proof. ∎
III Explicit Construction of Batch Codes
In this section to construct batch codes we associate information bits with elements of vector space , , and define parity-check bits as sums of information bits lying in some affine -dimensional subspaces. In particular, the following finite geometry framework turns out to be useful.
Definition 3**.**
Suppose is a collection of -dimensional subspaces in . This collection is said to be -nice if the two properties hold:
any two distinct subspaces from this collection have the trivial intersection in the origin only, i.e. for ; 2. 2.
for all and for all , , the affine subspace intersects at most subspaces from this collection.
Such a framework appears to be new in the literature up to our best knowledge. In the following statement we show how to use a nice collection of subspaces to construct batch codes.
Lemma 2**.**
Suppose is an -nice collection of -dimensional subspaces in . Then there exists a code.
We postpone the proof of Lemma 2 to Appendix. Now we give a construction of nice subspaces, which represents a collection of Reed-Solomon codes of length and dimension .
Construction 1*.*
Let stand for a -dimensional -vector space, and is an -basis for . Now let us define a collection of subspaces of size . Let the th, , subspace be the linear span of vectors , where vector , , is written in basis as follows
[TABLE]
We prove that is -nice in Proposition 1. Let be the maximal number such that there exists an -nice collection of -dimensional subspace in of cardinality . The next two propositions establish a quite tight estimate on the maximal cardinality of a nice collection of subspaces.
Proposition 1**.**
Construction 1 is -nice. This implies, in particular, for any , , and prime power integer , the lower bounds on holds
[TABLE]
Proposition 2**.**
[23]** For any and prime power integer , the upper bounds on holds
[TABLE]
We postpone the proof of Proposition 1 to Appendix. The proof of Proposition 2, suggested by Mary Wootters, is included to Appendix for completeness of the paper.
Finally Lemma 2 and Proposition 1 imply the following upper bound on the redundancy of batch codes.
Theorem 3**.**
For any , prime power integer and integer , , the redundancy of -batch codes is upper bounded by
[TABLE]
where .
Remark 1*.*
Proposition 2 verifies that the proposed framework based on finite geometry could not be significantly improved in terms of the range of parameter in Theorem 3, that is could not be larger than .
Proof of Theorem 3.
From Proposition 1 it follows that there exists an -nice collection of -dimensional subspaces in , which has cardinality . Take any subset of this collection of size , where . Lemma 2 states that there exists a code. This completes the proof. ∎
Let us demonstrate how Theorem 3 actually works.
Example 1**.**
Let , and . Then . Denote by . Let us index information symbols by vectors of , i.e., . First we define three direction vectors , and , which are linearly independent. We shall construct a systematic linear code. One can determine parity-check bits as sums of information bits which indexes lye on lines with given direction vectors. These lines represent distinct -dimensional affine subspaces of . For instance, there are lines with direction vector . Let us take one which goes through point . Then the corresponding parity-check bit is and the recovering set for based on this parity-check bit is . It is easy to show that there are other simple recovering sets for , which are of the form and . Moreover, each information bit has exactly simple recovering sets. For every bit, each of its recovering sets has a nonempty intersection with at most one recovering set of any other bit. This property immediately implies [3] that our code is a -batch code. For , in the proof of Lemma 2 we will show a generalization of this property.
IV Conclusion
In this paper new random coding bound and new explicit constructions of primitive linear batch codes based on finite geometry were developed. In some parameter regimes, our codes improves the redundancy than previously known batch codes. We note that the random coding bound coincides with the constructive bound in a countable number of points and gives better result in others. The natural open question arose in this work is to construct codes which would achieve random coding bound in all others points too. Another interesting question is how to improve the lower bound given by inequality (1).
Acknowledgment
We thank Eitan Yaakobi for the fruitful discussion on batch codes and Mary Wootters for the proof of Proposition 2. N. Polyanskii was supported in part the Russian Foundation for Basic Research (RFBR) through grant nos. 18-07-01427 A, 18-31-00310 MOL_A. I. Vorobyev was supported in part by RFBR through grant nos. 18-07-01427 A, 18-31-00361 MOL_A.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai, “Batch codes and their applications,” in Proceedings of the thirty-sixth annual ACM symposium on Theory of computing . ACM, 2004, pp. 262–271.
- 2[2] H. Asi and E. Yaakobi, “Nearly optimal constructions of pir and batch codes,” IEEE Transactions on Information Theory , 2018.
- 3[3] A. S. Rawat, Z. Song, A. G. Dimakis, and A. Gál, “Batch codes through dense graphs without short cycles,” IEEE Transactions on Information Theory , vol. 62, no. 4, pp. 1592–1604, 2016.
- 4[4] A. Vardy and E. Yaakobi, “Constructions of batch codes with near-optimal redundancy,” in Information Theory (ISIT), 2016 IEEE International Symposium on . IEEE, 2016, pp. 1197–1201.
- 5[5] S. Bhattacharya, S. Ruj, and B. Roy, “Combinatorial batch codes: A lower bound and optimal constructions,” Advances in Mathematics of Communications , vol. 6, no. 2, pp. 165–174, 2012.
- 6[6] R. A. Brualdi, K. P. Kiernan, S. A. Meyer, and M. W. Schroeder, “Combinatorial batch codes and transversal matroids,” Advances in Mathematics of Communications , vol. 4, no. 3, pp. 419–431, 2010.
- 7[7] N. Silberstein and A. Gál, “Optimal combinatorial batch codes based on block designs,” Designs, Codes and Cryptography , vol. 78, no. 2, pp. 409–424, 2016.
- 8[8] D. Stinson, R. Wei, and M. B. Paterson, “Combinatorial batch codes,” Advances in Mathematics of Communications , vol. 3, no. 1, pp. 13–27, 2009.
