Constructions of Batch Codes via Finite Geometry

Nikita Polyanskii; Ilya Vorobyev

arXiv:1901.06741·cs.IT·January 23, 2019

Constructions of Batch Codes via Finite Geometry

Nikita Polyanskii, Ilya Vorobyev

PDF

TL;DR

This paper introduces new explicit and random linear primitive batch codes constructed using finite geometry, achieving lower redundancy in certain parameter regimes compared to existing codes.

Contribution

It presents novel finite geometry-based constructions of linear primitive batch codes, improving redundancy efficiency over prior methods.

Findings

01

Codes have lower redundancy in some parameter regimes.

02

Explicit and random constructions are developed.

03

Linear primitive batch codes are successfully constructed.

Abstract

A primitive $k$ -batch code encodes a string $x$ of length $n$ into string $y$ of length $N$ , such that each multiset of $k$ symbols from $x$ has $k$ mutually disjoint recovering sets from $y$ . We develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In some parameter regimes, our proposed codes have lower redundancy than previously known batch codes.

Tables1

Table 1. TABLE I: Binary Batch Codes Summary

Construction	Availability $k$	Redundancy $r_{B} (n, k)$
Theorem 1 (random)	$k = o (n^{1 / 3} / \log n)$	$O (k^{3 / 2} \sqrt{n} \log n)$
Theorem 3 (explicit) for any $ℓ \in ℕ$	$k < \frac{1}{ℓ^{2}} n^{1 / (2 ℓ + 1)}$	$O (k n^{\frac{ℓ + 1}{2 ℓ + 1}})$

Equations77

y_{j} = x_{i} + t \in R ∖ {j} \sum x_{t} .

y_{j} = x_{i} + t \in R ∖ {j} \sum x_{t} .

r_{B} (n, k) \geq Ω (max (n, k)) .

r_{B} (n, k) \geq Ω (max (n, k)) .

r_{B} (n, k) = O (k^{3/2} n lo g n) .

r_{B} (n, k) = O (k^{3/2} n lo g n) .

ϕ (x_{1}, \dots, x_{n}) := (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{M}) .

ϕ (x_{1}, \dots, x_{n}) := (x_{1}, \dots, x_{n}, y_{1}, \dots, y_{M}) .

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})),

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})),

1 \leq i_{1} < \dots < i_{ℓ} \leq n and i = 1 \sum ℓ k_{i} = k .

1 \leq i_{1} < \dots < i_{ℓ} \leq n and i = 1 \sum ℓ k_{i} = k .

((x_{i_{1}}, k_{1}), \dots, (x_{i_{j - 1}}, k_{j - 1})), j - 1 < ℓ,

((x_{i_{1}}, k_{1}), \dots, (x_{i_{j - 1}}, k_{j - 1})), j - 1 < ℓ,

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})) .

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})) .

{x_{i_{1}}, \dots, x_{i_{j - 1}}, x_{i_{j + 1}}, \dots, x_{i_{ℓ}}} .

{x_{i_{1}}, \dots, x_{i_{j - 1}}, x_{i_{j + 1}}, \dots, x_{i_{ℓ}}} .

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})) with i = i_{j}, i_{1} < \dots < i_{ℓ}

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{ℓ})) with i = i_{j}, i_{1} < \dots < i_{ℓ}

f \in [ℓ] ∖ {j} \sum k_{f} = s, s + k_{j} = k

Pr (B \cup W_{1}) \leq Pr (W_{1}) + Pr i \in [n] s \in {0, \dots, k - 1} ⋃ B_{i, s} \leq Pr (W_{1}) + Pr (W_{2}) + k n i \in [n] s \in {0, \dots, k - 1} max Pr (B_{i, s} \cap \overline{W}_{2}) .

Pr (B \cup W_{1}) \leq Pr (W_{1}) + Pr i \in [n] s \in {0, \dots, k - 1} ⋃ B_{i, s} \leq Pr (W_{1}) + Pr (W_{2}) + k n i \in [n] s \in {0, \dots, k - 1} max Pr (B_{i, s} \cap \overline{W}_{2}) .

Pr (X \geq (1 + δ) μ) \leq e^{- \frac{δ ^{2} μ}{3}},

Pr (X \geq (1 + δ) μ) \leq e^{- \frac{δ ^{2} μ}{3}},

Pr (W_{1}) = Pr (M > 3 p_{1} n) \leq Pr (M > 2 p_{1} (n + q)) \leq e^{- \frac{p _{1} n}{3}}

Pr (W_{1}) = Pr (M > 3 p_{1} n) \leq Pr (M > 2 p_{1} (n + q)) \leq e^{- \frac{p _{1} n}{3}}

Pr (W_{2}) = Pr (“there exists S_{i} of size > 2 p_{2} q ”) \leq 2 n e^{- \frac{p _{2} q}{3}} .

Pr (W_{2}) = Pr (“there exists S_{i} of size > 2 p_{2} q ”) \leq 2 n e^{- \frac{p _{2} q}{3}} .

Pr (B_{i, s} \cap \overline{W}_{2}) \leq Pr (B_{i, s} \cap \overline{W}_{2, i}) \leq n^{k - 1} Pr (A \cap C \cap \overline{W}_{2, i}) \leq n^{k - 1} Pr (A ∣ \overline{W}_{2, i} \cap C),

Pr (B_{i, s} \cap \overline{W}_{2}) \leq Pr (B_{i, s} \cap \overline{W}_{2, i}) \leq n^{k - 1} Pr (A \cap C \cap \overline{W}_{2, i}) \leq n^{k - 1} Pr (A ∣ \overline{W}_{2, i} \cap C),

R_{i_{1}, 1}, \dots, R_{i_{1}, k_{i_{1}}}, R_{i_{2}, 1} \dots, R_{i_{j - 1}, k_{i_{j - 1}}}

R_{i_{1}, 1}, \dots, R_{i_{1}, k_{i_{1}}}, R_{i_{2}, 1} \dots, R_{i_{j - 1}, k_{i_{j - 1}}}

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{j_{ℓ}})),

((x_{i_{1}}, k_{1}), \dots, (x_{i_{ℓ}}, k_{j_{ℓ}})),

R_{i_{1}, 1}, \dots, R_{i_{1}, k_{i_{1}}}, R_{i_{2}, 1} \dots, R_{i_{j - 1}, k_{i_{j - 1}}} .

R_{i_{1}, 1}, \dots, R_{i_{1}, k_{i_{1}}}, R_{i_{2}, 1} \dots, R_{i_{j - 1}, k_{i_{j - 1}}} .

∣ I_{2} ∣ = u = 1 \sum j - 1 v = 1 \sum k_{u} (∣ R_{i_{u}, v} ∣ - 1) \leq 2 q p_{2} k,

∣ I_{2} ∣ = u = 1 \sum j - 1 v = 1 \sum k_{u} (∣ R_{i_{u}, v} ∣ - 1) \leq 2 q p_{2} k,

η := i = 1 \sum q /3 ξ_{i} .

η := i = 1 \sum q /3 ξ_{i} .

Pr (A ∣ \overline{W}_{2, i} \cap C) \leq Pr (χ < k - s) \leq (k q /3) (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /3 - k} \leq q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} .

Pr (A ∣ \overline{W}_{2, i} \cap C) \leq Pr (χ < k - s) \leq (k q /3) (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /3 - k} \leq q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} .

Pr (B \cup W_{1}) \leq 2 n e^{- \frac{p _{2} q}{3}} + e^{- \frac{p _{1} n}{3}} + k n^{k} q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} .

Pr (B \cup W_{1}) \leq 2 n e^{- \frac{p _{2} q}{3}} + e^{- \frac{p _{1} n}{3}} + k n^{k} q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} .

k n^{k} q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} \leq k n^{1.5 k} e^{- p_{1} p_{2} (1 - p_{2})^{4 p_{2} k} q /4} .

k n^{k} q^{k} (1 - p_{1} p_{2} (1 - p_{2})^{4 p_{2} k})^{q /4} \leq k n^{1.5 k} e^{- p_{1} p_{2} (1 - p_{2})^{4 p_{2} k} q /4} .

(1 - p_{2})^{4 p_{2} k} \geq 1 - 4 p_{2}^{2} k = 1/2.

(1 - p_{2})^{4 p_{2} k} \geq 1 - 4 p_{2}^{2} k = 1/2.

p_{1} := 36 \frac{k ^{3/2} lo g n}{n}

p_{1} := 36 \frac{k ^{3/2} lo g n}{n}

v_{j}^{i} := (1, α^{ℓ i + j}, α^{2 (ℓ i + j)}, \dots, α^{2 ℓ (ℓ i + j)}) .

v_{j}^{i} := (1, α^{ℓ i + j}, α^{2 (ℓ i + j)}, \dots, α^{2 ℓ (ℓ i + j)}) .

m (L, ℓ, q) \geq ⌊ q / ℓ ⌋ .

m (L, ℓ, q) \geq ⌊ q / ℓ ⌋ .

m (L, ℓ, q) \leq (L + 1) q .

m (L, ℓ, q) \leq (L + 1) q .

r_{B} (n, k) \leq ℓ k q^{ℓ + 1},

r_{B} (n, k) \leq ℓ k q^{ℓ + 1},

\frac{∣ F _{q}^{2 ℓ + 1} ∣∣ C ∣}{∣ V _{i} ∣} = m q^{ℓ + 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Constructions of Batch Codes via Finite Geometry

Nikita Polyanskii1, and Ilya Vorobyev12

1Center for Computational and Data-Intensive Science and Engineering,

Skolkovo Institute of Science and Technology

Moscow, Russia 127051

2Advanced Combinatorics and Complex Networks Lab,

Moscow Institute of Physics and Technology

Dolgoprudny, Russia 141701

Emails: [email protected], [email protected]

Abstract

A primitive $k$ -batch code encodes a string $x$ of length $n$ into string $y$ of length $N$ , such that each multiset of $k$ symbols from $x$ has $k$ mutually disjoint recovering sets from $y$ . We develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In some parameter regimes, our proposed codes have lower redundancy than previously known batch codes.

Index Terms:

Private information retrieval, finite geometry, primitive batch codes

I Introduction

Batch codes were originally proposed by Ishai et al. [1] for load balancing in distributed systems, and amortizing the computational cost of private information retrieval and related cryptographic protocols. Ishai et al. gave a definition of batch codes in a general form, namely $n$ information symbols $x_{1},\ldots,x_{n}$ are encoded to an $m$ -tuple of strings $y_{1},\ldots,y_{m}$ (referred to as buckets) of total length $N$ , such that for each $k$ -tuple (batch) of distinct indices $i_{1},\ldots,i_{k}\in[n]$ , the entries $x_{i_{1}},\ldots,x_{i_{k}}$ can be decoded by reading at most $t$ symbols from each bucket. The parameter $k$ is usually called availability and it plays an important role in supporting high throughput of the distributed storage system. If a batch could contain any multiset of indices (not only distinct indices), then we use the term a multiset batch code. In a special case when $t=1$ and each bucket contains one symbol, a multiset batch code is called primitive. This class of batch codes is the most studied one in the literature since there are several statements [1] which allow to trade between different choices of $n$ , $N$ , $m$ , $t$ and $k$ . In other words, better constructions of primitive batch codes would imply better constructions of multiset batch codes.

I-A Notation

We denote the field of size $2$ by $\mathbb{F}_{2}$ . The symbol $[n]$ stands for the set of integers $\{1,2,\ldots,n\}$ . Let us give a formal definition of codes studied in this paper.

Definition 1.

Let $C$ be a linear code of length $N$ and dimension $n$ over the field $\mathbb{F}_{2}$ , which encodes a string $x_{1},\ldots,x_{n}$ to $y_{1},\ldots,y_{N}$ . The code $C$ will be called a primitive linear $k$ -batch code (simply, $k$ -batch code), and will be denoted by $[N,n,k]^{B}$ , if for every multiset of symbols $\{x_{i_{1}},\ldots,x_{i_{k}}\},$ ${i_{j}}\in[n]$ , there exist $k$ mutually disjoint sets $R_{i_{1}},\ldots,R_{i_{k}}\subset[N]$ (referred to as recovering sets) such that for all $j\in[k]$ , $x_{i_{j}}$ is a sum of the symbols $y_{p}$ with indices $p$ from $R_{i_{j}}$ .

Given $n$ and $k$ , we denote the minimal integer $N$ such that an $[N,n,k]^{B}$ code exists by $N_{B}(n,k)$ . In this paper we focus on the minimal redundancy of batch codes, which we abbreviate by $r_{B}(n,k):=N_{B}(n,k)-n$ .

Recall that a systematic linear code is a linear code in which the input data is embedded in the encoded output, i.e., $y_{i}=x_{i}$ for $i\in[n]$ . In what follows we are going to construct systematic linear batch codes. The following special case of recovering sets will be particularly useful.

Definition 2.

For a systematic linear code, we say that the recovering set $R$ for information symbol $x_{i}$ is simple if $R$ contains exactly one index greater than $n$ . In other words, if $j$ is such an index, then

[TABLE]

Note that many constructions, suggested earlier and in this paper, possess a more stronger property than one described in Definition 1 – the existence of mutually disjoint simple recovering sets.

We use the notation $n^{\varepsilon^{-}}$ in a statement to demonstrate that the statement remains true for all $n^{\varepsilon-c}$ , where $c$ is any fixed positive number. In the rest of the paper we will mainly concentrate on the case when $k=n^{\varepsilon}$ , $n\to\infty$ .

I-B Related Work

The authors of [1] provided constructions of various families of batch codes. Those constructions were based on unbalanced expanders, on recursive application of trivial batch codes, on smooth and Reed-Muller codes, and others. Many other constructions proposed later in [2, 3, 4] improve the redundancy of batch codes. In particular, a systematic linear code, defined by the generator matrix $G=[I_{n}|E]$ , is shown [3] to be a $k$ -batch code, where $k$ is the minimal number of ones in rows of $E$ and the bipartite graph, whose biadjacency matrix is $E$ , has no cycle of length at most $6$ . Constructions based on array codes and multiplicity codes were investigated in [2].

There is another class of related codes which is called combinatorial batch codes. For these codes the same property as for the batch codes is required, but symbols cannot be encoded. Such codes were investigated in [5, 6, 7, 8, 9]. A special case of batch codes, called switch codes, was studied in [10, 11, 12, 13]. It was suggested in [10] to use such codes to increase the parallelism of data routing in the network switches. Private information retrieval (PIR) codes can be seen as an instance of batch codes, namely we require a weaker property that every information symbol has $k$ mutually independent recovering sets. PIR codes were suggested in [14] to decrease storage overhead in PIR schemes preserving both privacy and communication complexity. Some constructions and bounds for PIR codes can be found in [15, 16, 2, 14, 17]. One-step majority-logic decodable codes [18] require a stronger property than PIR codes, namely every encoded symbol should have $k$ mutually independent recovering sets. Also we refer the reader to locally repairable codes with availability [19, 20, 21], which have an additional (with respect to PIR codes) constraint on the size of recovering sets.

Recall some known results on the minimal redundancy of batch codes:

$r_{B}(n,k)\geq k-1$ ; 2. 2.

$r_{B}(n,k)=\Omega(\sqrt{n})$ for $k\geq 3$ , [22, 16]; 3. 3.

$r_{B}(n,k)=O(k^{2}\sqrt{n}\log n)$ for $k\leq\sqrt{n}/\log n$ , [2]; 4. 4.

$r_{B}(n,n^{\varepsilon})\leq n^{7/8}$ for $7/32<\varepsilon\leq 1/4$ , [3]; 5. 5.

$r_{B}(n,n^{\varepsilon})\leq n^{4\varepsilon}$ for $1/5<\varepsilon\leq 7/32$ , [3]; 6. 6.

$r_{B}(n,n^{\varepsilon^{-}})\leq n^{5/6+\varepsilon/3}$ for $0<\varepsilon\leq 1/2$ , [2]; 7. 7.

$r_{B}(n,n^{1-\varepsilon})\leq n^{1-\delta}$ for $0\leq\varepsilon\leq 1$ , where ${\delta=\delta(\varepsilon)>0}$ , [2].

In particular, it follows that the best known lower bound on the redundancy of batch codes is as follows

[TABLE]

I-C Our contribution

In this paper we develop new explicit and random coding constructions of linear primitive batch codes based on finite geometry. In Table I our contribution (upper bounds on $r_{B}(N,k)$ ) is summarized.

Let us denote $r_{B}(n,k=n^{\varepsilon})=:O(n^{\delta})$ . The lower bound given by (1) along with old and new upper bounds on $\delta=\delta(\varepsilon)$ are plotted in Figure 1. The existence result of our work shows that the known upper bound on $\delta(\varepsilon)$ can be improved for $\varepsilon\in(0,2/7)\setminus\{1/5,1/4\}$ . Furthermore, we emphasize that the endpoints of novel explicit constructions by Theorem 3 lye on the segment given by the random construction in Theorem 1.

I-D Outline

The remainder of the paper is organized as follows. In Section II we prove the existence of batch codes using the probabilistic method. The achieved upper bound on the redundancy improves previously known results when $k=n^{\varepsilon}$ and $\varepsilon\in(0,2/7)\setminus\{1/5,1/4\}$ . We note that for $k=n^{1/4}$ and $k=n^{1/5}$ , the redundancy of our construction is worse by the multiplicative factor $\log n$ than one in [3]. In Section III we describe our main results and give new explicit constructions of batch codes. In a more detail, we associate information bits with elements of vector space $\mathbb{F}_{q}^{2\ell+1}$ , $\ell\in{\mathbb{N}}$ , and define parity-check bits as sums of information bits lying in some affine $\ell$ -dimensional subspaces. Finally, Section IV concludes the paper.

II Random Construction of Batch Codes

To prove the following statement, we consider a systematic linear code defined by the generator matrix $G=[I_{n}|E]$ , where $E$ is taken as an incidence matrix of randomly chosen family of subsets of lines in the affine plane.

Theorem 1.

For $k=o(n^{1/3}/\log n)$ , the redundancy of $k$ -batch codes is

[TABLE]

Proof.

For simplicity of notation and without loss of generality, we assume that $n=q^{2}$ , $q$ is a prime power integer and $k<q/12$ . Consider a finite affine plane $(P,L)$ of order $q$ , where $P$ , $|P|=n$ , is a set of points, and $L$ , $|L|=n+q$ , is a set of lines. Each line is known to contain $q$ points, and each point is in $q+1$ lines, any two lines in the affine plane cross each other in at most $1$ point.

Let us randomly choose a family $F:=\{S_{1},\ldots,S_{M}\}$ of subsets of lines in the affine space. First, we take each line in the affine space independently with probability $p_{1}$ , which will be specified later. Second, we define a subset of any included line by leaving each point on the line independently with probability $p_{2}$ , which will be specified later. It can be seen that for a proper choice $p_{1}$ , the cardinality of $F$ , $|F|=M$ (total number of subsets), is “close” to its average $p_{1}(n+q)$ with high probability, and for a proper choice of $p_{2}$ , the cardinality of any subset $S_{i}$ is “close” to its average $p_{2}q$ . We define event $W_{1}$ when the total number of lines $M>3p_{1}n$ , and $W_{2}$ if there exists some $S_{i}$ of size $>2p_{2}q$ . Moreover, we define $W_{2,j}$ , $j\in[n]$ , if there exists $S_{i}$ of size $>2p_{2}q$ such that the line corresponding to subset $S_{i}$ does not contain the $j$ th point.

Now we consider some bijection between $n$ information symbols and $n$ points. Therefore, the information symbols are associated with the points in the plane. Given a subset $S_{i}$ , we can define a parity-check symbol $y_{i}$ as a sum of information symbols corresponding to points in $S_{i}$ . Let us consider a systematic linear code $C$ of length $n+M$ and dimension $n$ defined as a map $\phi:\mathbb{F}_{2}^{n}\to\mathbb{F}_{2}^{n+M}$ :

[TABLE]

Given a multiset of information symbols of size $k$ , we can uniquely represent it in the form

[TABLE]

where

[TABLE]

We define a greedy algorithm for constructing a collection of recovering sets for any given multiset of information bits of size at most $k$ . Assume that the algorithm can construct simple recovering sets for the multiset

[TABLE]

representing the first $j-1$ groups of the multiset

[TABLE]

Then find first $k_{j}$ parity-check symbols depending on symbol $x_{i_{j}}$ , such that the corresponding $k_{j}$ simple recovering sets are disjoint with already chosen recovering sets, and $k_{j}$ lines corresponding to the parity-check symbols does not go through any point in the set

[TABLE]

Let us add these $k_{j}$ recovering sets to the collection of recovering sets. We note that added $k_{j}$ simple recovering sets are mutually disjoint by our construction.

To show that the code $C$ is likely to be a $k$ -batch code, we are going to estimate the probability of event $B$ that the greedy algorithm fails for some multiset of information symbols. To get an estimate of this event, we introduce auxiliary terminology. We say that the information symbol $x_{i}$ is $s$ -bad, $0\leq s<k$ , if there exists some multiset

[TABLE]

so that the algorithm finds recovering sets for the first $(j-1)$ groups of the multiset and fails to find $k_{j}$ recovering sets for $x_{i}=x_{i_{j}}$ . Let $B_{i,s}$ be an event that information symbol $x_{i}$ is $s$ -bad. If no event among $B_{i,s}$ occurs, then the event $B$ doesn’t happen.

We note that $k$ -batch code with redundancy at most $3p_{1}n$ exists if $\Pr(B\cup W_{1})<1$ . Now we estimate this event as follows

[TABLE]

It is easy to estimate $\operatorname{Pr}\left({W_{1}}\right)$ and $\operatorname{Pr}\left({W_{2}}\right)$ applying the Chernoff bound in the form

[TABLE]

where $0<\delta<1$ , and $X$ is a sum of independent random variables taking values in $\{0,1\}$ with $\operatorname{E}{X}=\mu$ . We have

[TABLE]

and

[TABLE]

Now we estimate the third probability in (2) as follows

[TABLE]

where $C$ stands for the event that the algorithm finds recovering sets

[TABLE]

for the first $j-1$ groups of

[TABLE]

and $A$ denotes the event that the algorithm fails to find $k-s$ recovering sets for $x_{i}$ , which are disjoint with all recovering sets the algorithm found. Let $I_{1};=\{i_{1},\ldots,i_{\ell}\}$ , and $I_{2}$ be a set of information symbols included to recovering sets

[TABLE]

The cardinality of $I_{2}$ given the event $\overline{W}_{2,i}$ (consequently, given the event $\overline{W}_{2,i}\cap C$ ) is upper bounded as follows

[TABLE]

since $\overline{W}_{2,i}$ stands for the event that all the subsets corresponding to the lines disjoint with $x_{i}$ are of size at most $2p_{2}q$ . The total number of lines containing $x_{i}$ is equal to $q+1$ . One can easily see that there are at most $k$ of them which have a nonzero intersection with $I_{1}$ . Since all the lines containing fixed point $x_{i}$ share only $x_{i}$ , we claim that there are at most $q/2$ lines which intersect $I_{2}$ by at least $4p_{2}k$ points. Indeed, otherwise we can lower bound the cardinality of $I_{2}$ by $\geq 4p_{2}k(q/2+1)$ which contradicts with (6). We shall try to recover symbol $x_{i}$ with the help of other $t$ , $t\geq q/2-k\geq q/3$ , lines. Enumerate them from $1$ to $t$ . Let $\xi_{1},\ldots,\xi_{t}$ be indicator random variables, which equals 1 iff

the corresponding line was randomly taken (with probability $p_{1}$ ), 2. 2.

the symbol $x_{i}$ was left (with probability $p_{2}$ ) and included to the parity-check sum, 3. 3.

none of the symbols from $I_{2}$ were added in the corresponding parity-check.

Define the random variable

[TABLE]

Since $\xi_{i}$ is an independent Bernoulli random variable with probability $p^{\prime}_{i}\geq p_{1}p_{2}(1-p_{2})^{4p_{2}j}$ , we claim that Binomial random variable $\chi$ with parameters $q/3$ and $p_{1}p_{2}(1-p_{2})^{4p_{2}j}$ is stochastically dominated by $\eta$ . Now we proceed with upper bounding (5) as follows

[TABLE]

Combining the last inequality together with (2)-(5) yields

[TABLE]

Given $\varepsilon>0$ , there exists sufficiently large $q_{0}$ such that for $q>q_{0}$ the first two terms are at most $\varepsilon$ . Now we proceed with the last term

[TABLE]

Taking $p_{2}:=1/\sqrt{8k}$ , we have $4p_{2}k\geq 1$ and

[TABLE]

From this it follows that for

[TABLE]

and sufficiently large $n$ , $n=q^{2}$ , the last term in (7) is at most $\varepsilon$ . Therefore, we obtain that there exists a $k$ -batch code with redundancy $M<108k^{3/2}\sqrt{n}\log n$ with probability at least $1-3\varepsilon$ . This completes the proof. ∎

III Explicit Construction of Batch Codes

In this section to construct batch codes we associate information bits with elements of vector space $\mathbb{F}_{q}^{2\ell+1}$ , $\ell\in{\mathbb{N}}$ , and define parity-check bits as sums of information bits lying in some affine $\ell$ -dimensional subspaces. In particular, the following finite geometry framework turns out to be useful.

Definition 3.

Suppose $\{V_{1},\ldots,V_{m}\}$ is a collection of $\ell$ -dimensional subspaces in $\mathbb{F}_{q}^{2\ell+1}$ . This collection is said to be $L$ -nice if the two properties hold:

any two distinct subspaces from this collection have the trivial intersection in the origin only, i.e. $|V_{i}\cap V_{j}|=1$ for $i\neq j$ ; 2. 2.

for all $i\in[m]$ and for all $v\in\mathbb{F}_{q}^{2\ell+1}$ , $v\not\in V_{i}$ , the affine subspace $v+V_{i}$ intersects at most $L$ subspaces from this collection.

Such a framework appears to be new in the literature up to our best knowledge. In the following statement we show how to use a nice collection of subspaces to construct batch codes.

Lemma 2.

Suppose $\{V_{1},\ldots,V_{m}\}$ is an $L$ -nice collection of $\ell$ -dimensional subspaces in $\mathbb{F}_{q}^{2\ell+1}$ . Then there exists a $[q^{2\ell+1}+mq^{\ell+1},q^{2\ell+1},\lfloor m/L\rfloor]^{B}$ code.

We postpone the proof of Lemma 2 to Appendix. Now we give a construction of nice subspaces, which represents a collection of Reed-Solomon codes of length $2\ell+1$ and dimension $\ell$ .

*Construction 1**.*

Let $V$ stand for a $(2\ell+1)$ -dimensional $\mathbb{F}_{q}$ -vector space, and $B$ is an $\mathbb{F}_{q}$ -basis for $V$ . Now let us define a collection ${\mathcal{C}}$ of subspaces of size $m:=\lfloor q/\ell\rfloor$ . Let the $i$ th, $0\leq i<m$ , subspace $V_{i}\in{\mathcal{C}}$ be the linear span of $\ell$ vectors $\{{v}^{i}_{1},\ldots,{v}^{i}_{\ell}\}$ , where vector ${v}^{i}_{j}$ , $j\in\{0,\ldots,\ell-1\}$ , is written in basis $B$ as follows

[TABLE]

We prove that ${\mathcal{C}}$ is $\ell$ -nice in Proposition 1. Let $m(L,\ell,q)$ be the maximal number $m$ such that there exists an $L$ -nice collection of $\ell$ -dimensional subspace in $\mathbb{F}_{q}^{2\ell+1}$ of cardinality $m$ . The next two propositions establish a quite tight estimate on the maximal cardinality of a nice collection of subspaces.

Proposition 1.

Construction 1 is $\ell$ -nice. This implies, in particular, for any $\ell,\,L\in{\mathbb{N}}$ , $L\geq\ell$ , and prime power integer $q$ , the lower bounds on $m(L,\ell,q)$ holds

[TABLE]

Proposition 2.

[23]** For any $\ell,\,L\in{\mathbb{N}}$ and prime power integer $q$ , the upper bounds on $m(L,\ell,q)$ holds

[TABLE]

We postpone the proof of Proposition 1 to Appendix. The proof of Proposition 2, suggested by Mary Wootters, is included to Appendix for completeness of the paper.

Finally Lemma 2 and Proposition 1 imply the following upper bound on the redundancy of batch codes.

Theorem 3.

For any $\ell\in{\mathbb{N}}$ , prime power integer $q$ and integer $k$ , $0<k\leq\lfloor q/\ell^{2}\rfloor$ , the redundancy of $k$ -batch codes is upper bounded by

[TABLE]

where $n=q^{2\ell+1}$ .

*Remark 1**.*

Proposition 2 verifies that the proposed framework based on finite geometry could not be significantly improved in terms of the range of parameter $k$ in Theorem 3, that is $k$ could not be larger than $\lfloor(L+1)q/L\rfloor$ .

Proof of Theorem 3.

From Proposition 1 it follows that there exists an $\ell$ -nice collection of $\ell$ -dimensional subspaces in $\mathbb{F}_{q}^{2\ell+1}$ , which has cardinality $\lfloor q/\ell\rfloor$ . Take any subset of this collection of size $m=\ell k$ , where $k\leq\lfloor q/\ell^{2}\rfloor$ . Lemma 2 states that there exists a $[q^{2\ell+1}+\ell kq^{\ell+1},q^{2\ell+1},k]^{B}$ code. This completes the proof. ∎

Let us demonstrate how Theorem 3 actually works.

Example 1.

Let $q=3$ , $\ell=1$ and $k=3$ . Then $n=3^{3}=27$ . Denote by $\mathbb{F}_{3}=\{0,1,2\}$ . Let us index $n$ information symbols by vectors of $\mathbb{F}_{3}^{3}$ , i.e., $x_{000},x_{001},\ldots,x_{222}$ . First we define three direction vectors $(1,0,0)$ , $(1,1,1)$ and $(1,2,1)$ , which are linearly independent. We shall construct a systematic linear code. One can determine $kn^{2/3}=27$ parity-check bits as sums of information bits which indexes lye on lines with given direction vectors. These lines represent distinct $1$ -dimensional affine subspaces of $\mathbb{F}_{3}^{3}$ . For instance, there are $9$ lines with direction vector $(1,2,1)$ . Let us take one which goes through point $(0,1,2)$ . Then the corresponding parity-check bit is $y_{i^{\prime}}=x_{012}+x_{100}+x_{221}$ and the recovering set for $x_{012}$ based on this parity-check bit is $\{i^{\prime},100,221\}$ . It is easy to show that there are $2$ other simple recovering sets for $x_{012}$ , which are of the form $\{i^{\prime\prime},112,212\}$ and $\{i^{\prime\prime\prime},120,201\}$ . Moreover, each information bit has exactly $3$ simple recovering sets. For every bit, each of its recovering sets has a nonempty intersection with at most one recovering set of any other bit. This property immediately implies [3] that our code is a $3$ -batch code. For $\ell>1$ , in the proof of Lemma 2 we will show a generalization of this property.

IV Conclusion

In this paper new random coding bound and new explicit constructions of primitive linear batch codes based on finite geometry were developed. In some parameter regimes, our codes improves the redundancy than previously known batch codes. We note that the random coding bound coincides with the constructive bound in a countable number of points and gives better result in others. The natural open question arose in this work is to construct codes which would achieve random coding bound in all others points too. Another interesting question is how to improve the lower bound given by inequality (1).

Acknowledgment

We thank Eitan Yaakobi for the fruitful discussion on batch codes and Mary Wootters for the proof of Proposition 2. N. Polyanskii was supported in part the Russian Foundation for Basic Research (RFBR) through grant nos. 18-07-01427 A, 18-31-00310 MOL_A. I. Vorobyev was supported in part by RFBR through grant nos. 18-07-01427 A, 18-31-00361 MOL_A.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai, “Batch codes and their applications,” in Proceedings of the thirty-sixth annual ACM symposium on Theory of computing . ACM, 2004, pp. 262–271.
2[2] H. Asi and E. Yaakobi, “Nearly optimal constructions of pir and batch codes,” IEEE Transactions on Information Theory , 2018.
3[3] A. S. Rawat, Z. Song, A. G. Dimakis, and A. Gál, “Batch codes through dense graphs without short cycles,” IEEE Transactions on Information Theory , vol. 62, no. 4, pp. 1592–1604, 2016.
4[4] A. Vardy and E. Yaakobi, “Constructions of batch codes with near-optimal redundancy,” in Information Theory (ISIT), 2016 IEEE International Symposium on . IEEE, 2016, pp. 1197–1201.
5[5] S. Bhattacharya, S. Ruj, and B. Roy, “Combinatorial batch codes: A lower bound and optimal constructions,” Advances in Mathematics of Communications , vol. 6, no. 2, pp. 165–174, 2012.
6[6] R. A. Brualdi, K. P. Kiernan, S. A. Meyer, and M. W. Schroeder, “Combinatorial batch codes and transversal matroids,” Advances in Mathematics of Communications , vol. 4, no. 3, pp. 419–431, 2010.
7[7] N. Silberstein and A. Gál, “Optimal combinatorial batch codes based on block designs,” Designs, Codes and Cryptography , vol. 78, no. 2, pp. 409–424, 2016.
8[8] D. Stinson, R. Wei, and M. B. Paterson, “Combinatorial batch codes,” Advances in Mathematics of Communications , vol. 3, no. 1, pp. 13–27, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Constructions of Batch Codes via Finite Geometry

Abstract

Index Terms:

I Introduction

I-A Notation

Definition 1**.**

Definition 2**.**

I-B Related Work

I-C Our contribution

I-D Outline

II Random Construction of Batch Codes

Theorem 1**.**

Proof.

III Explicit Construction of Batch Codes

Definition 3**.**

Lemma 2**.**

Construction 1*.*

Proposition 1**.**

Proposition 2**.**

Theorem 3**.**

Remark 1*.*

Proof of Theorem 3.

Example 1**.**

IV Conclusion

Acknowledgment

Definition 1.

Definition 2.

Theorem 1.

Definition 3.

Lemma 2.

*Construction 1**.*

Proposition 1.

Proposition 2.

Theorem 3.

*Remark 1**.*

Example 1.