On an Equivalence Between Single-Server PIR with Side Information and   Locally Recoverable Codes

Swanand Kadhe; Anoosheh Heidarzadeh; Alex Sprintson; and O. Ozan; Koyluoglu

arXiv:1907.00598·cs.IT·July 2, 2019

On an Equivalence Between Single-Server PIR with Side Information and Locally Recoverable Codes

Swanand Kadhe, Anoosheh Heidarzadeh, Alex Sprintson, and O. Ozan, Koyluoglu

PDF

TL;DR

This paper reveals a fundamental equivalence between single-server PIR with side information and locally recoverable codes, enabling new bounds and insights into both areas.

Contribution

It establishes a novel equivalence between PIR schemes with side information and locally recoverable codes, including cooperative variants, providing new bounds and theoretical insights.

Findings

01

PIR schemes for single message retrieval are equivalent to classical LRCs.

02

PIR schemes for multiple message retrieval are equivalent to cooperative LRCs.

03

Derived new upper bounds on download rates for PIR-SI and cooperative LRCs.

Abstract

Private Information Retrieval (PIR) problem has recently attracted a significant interest in the information-theory community. In this problem, a user wants to privately download one or more messages belonging to a database with copies stored on a single or multiple remote servers. In the single server scenario, the user must have prior side information, i.e., a subset of messages unknown to the server, to be able to privately retrieve the required messages in an efficient way. In the last decade, there has also been a significant interest in Locally Recoverable Codes (LRC), a class of storage codes in which each symbol can be recovered from a limited number of other symbols. More recently, there is an interest in 'cooperative' locally recoverable codes, i.e., codes in which multiple symbols can be recovered from a small set of other code symbols. In this paper, we establish a…

Equations66

I_{W} = e_{W_{1}} e_{W_{2}} ⋮ e_{W_{D}} .

I_{W} = e_{W_{1}} e_{W_{2}} ⋮ e_{W_{D}} .

p_{\boldsymbol{S}}(S)=\left\{\begin{array}[]{ll}\frac{1}{\binom{K}{M}},&S\subset[K],|S|=M,\\ 0,&\text{otherwise}.\end{array}\right.

p_{\boldsymbol{S}}(S)=\left\{\begin{array}[]{ll}\frac{1}{\binom{K}{M}},&S\subset[K],|S|=M,\\ 0,&\text{otherwise}.\end{array}\right.

p_{\boldsymbol{W}|\boldsymbol{S}}(W\mid S)=\left\{\begin{array}[]{ll}\frac{1}{\binom{K-M}{D}},&W\subseteq[K]\setminus S,|W|=D,\\ 0,&\text{otherwise}.\end{array}\right.

p_{\boldsymbol{W}|\boldsymbol{S}}(W\mid S)=\left\{\begin{array}[]{ll}\frac{1}{\binom{K-M}{D}},&W\subseteq[K]\setminus S,|W|=D,\\ 0,&\text{otherwise}.\end{array}\right.

I (W; Q^{[W, S]}) = 0.

I (W; Q^{[W, S]}) = 0.

I (W, S; Q^{[W, S]}) = 0.

I (W, S; Q^{[W, S]}) = 0.

H (X_{W} ∣ A^{[W, S]}, Q^{[W, S]}, X_{S}, W, S) = 0.

H (X_{W} ∣ A^{[W, S]}, Q^{[W, S]}, X_{S}, W, S) = 0.

R = \frac{D lo g q}{H ( A ^{[W, S]} )} .

R = \frac{D lo g q}{H ( A ^{[W, S]} )} .

d_{min} (C) \leq n - k - ⌈ \frac{k}{r} ⌉ + 2.

d_{min} (C) \leq n - k - ⌈ \frac{k}{r} ⌉ + 2.

d_{min} (C) \leq n - k + 1 - ℓ (⌈ \frac{k}{r} ⌉ - 1) .

d_{min} (C) \leq n - k + 1 - ℓ (⌈ \frac{k}{r} ⌉ - 1) .

A^{[W, S]} = E X,

A^{[W, S]} = E X,

e_{W^{'}} \in ⟨ [E I_{S^{'}}] ⟩ .

e_{W^{'}} \in ⟨ [E I_{S^{'}}] ⟩ .

P (W = W^{'} ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

P (W = W^{'} ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

E = 1_{M + 1} 0_{M + 1} ⋮ 0_{M + 1} 0_{M + 1} 1_{M + 1} ⋮ 0_{M + 1} \dots \dots ⋱ \dots 0_{β} 0_{β} ⋮ 1_{β},

E = 1_{M + 1} 0_{M + 1} ⋮ 0_{M + 1} 0_{M + 1} 1_{M + 1} ⋮ 0_{M + 1} \dots \dots ⋱ \dots 0_{β} 0_{β} ⋮ 1_{β},

K \geq K - T + ⌈ \frac{K - T}{M} ⌉ - 2 + d .

K \geq K - T + ⌈ \frac{K - T}{M} ⌉ - 2 + d .

T \geq ⌈ \frac{K}{M + 1} ⌉ .

T \geq ⌈ \frac{K}{M + 1} ⌉ .

E = \times 0 ⋮ 0 \dots \dots ⋱ \dots \times 0 ⋮ 0 0 \times ⋮ 0 \dots \dots ⋱ \dots 0 \times ⋮ 0 \dots \dots ⋱ \dots 00 ⋮ \times \dots \dots ⋱ \dots 00 ⋮ \times,

E = \times 0 ⋮ 0 \dots \dots ⋱ \dots \times 0 ⋮ 0 0 \times ⋮ 0 \dots \dots ⋱ \dots 0 \times ⋮ 0 \dots \dots ⋱ \dots 00 ⋮ \times \dots \dots ⋱ \dots 00 ⋮ \times,

P (Q^{[W, S]} = π ∣ W = W) = \frac{1}{K !} .

P (Q^{[W, S]} = π ∣ W = W) = \frac{1}{K !} .

\IEEEeqnarraymulticol 3 l P (Q^{[W, S]} = π ∣ W = W)

\IEEEeqnarraymulticol 3 l P (Q^{[W, S]} = π ∣ W = W)

= (b)

\IEEEeqnarraymulticol 3 l P (Q^{[W, S]} = π ∣ W = W)

\IEEEeqnarraymulticol 3 l P (Q^{[W, S]} = π ∣ W = W)

= (b)

e_{i_{j}} \in ⟨ [E I_{S^{'}}] ⟩, \forall i_{j} \in W^{'} .

e_{i_{j}} \in ⟨ [E I_{S^{'}}] ⟩, \forall i_{j} \in W^{'} .

P (W = W^{'} ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

P (W = W^{'} ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

P (S = S_{i}, i \in W ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

P (S = S_{i}, i \in W ∣ Q^{[W, S]} = Q^{[W, S]}) = 0,

e_{i} \in ⟨ [E I_{S}] ⟩ .

e_{i} \in ⟨ [E I_{S}] ⟩ .

W = {(W, S) ∣ W \in [K], S \subset [K] ∖ {W}, ∣ S ∣ = M} .

W = {(W, S) ∣ W \in [K], S \subset [K] ∖ {W}, ∣ S ∣ = M} .

P (W = W^{'} ∣ Q (W, S) = A) = P (W = W^{'}),

P (W = W^{'} ∣ Q (W, S) = A) = P (W = W^{'}),

D (A (X_{1}, \dots, X_{K}), X_{S}) = X_{W} .

D (A (X_{1}, \dots, X_{K}), X_{S}) = X_{W} .

C + u = {c + u ∣ c \in C} .

C + u = {c + u ∣ c \in C} .

(C + u_{i}) \cap (C + u_{j}) = \emptyset, \forall i \neq = j,

(C + u_{i}) \cap (C + u_{j}) = \emptyset, \forall i \neq = j,

\cup_{j = 0}^{q^{T_{O P T}} - 1} (C + u_{j}) = F_{q}^{K} .

\cup_{j = 0}^{q^{T_{O P T}} - 1} (C + u_{j}) = F_{q}^{K} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On an Equivalence Between Single-Server PIR with Side Information and Locally Recoverable Codes

Swanand Kadhe, Anoosheh Heidarzadeh, Alex Sprintson, and O. Ozan Koyluoglu

Abstract

Private Information Retrieval (PIR) problem has recently attracted a significant interest in the information-theory community. In this problem, a user wants to privately download one or more messages belonging to a database with copies stored on a single or multiple remote servers. In the single server scenario, the user must have prior side information, i.e., a subset of messages unknown to the server, to be able to privately retrieve the required messages in an efficient way.

In the last decade, there has also been a significant interest in Locally Recoverable Codes (LRC), a class of storage codes in which each symbol can be recovered from a limited number of other symbols. More recently, there is an interest in cooperative locally recoverable codes, i.e., codes in which multiple symbols can be recovered from a small set of other code symbols.

In this paper, we establish a relationship between coding schemes for the single-server PIR problem and LRCs. In particular, we show the following results: (i) PIR schemes designed for retrieving a single message are ‘equivalent’ to classical LRCs; and (ii) PIR schemes for retrieving multiple messages are equivalent to cooperative LRCs. These equivalence results allow us to recover upper bounds on the download rate for PIR-SI schemes, and to obtain a novel rate upper bound on cooperative LRCs. We show results for both linear and non-linear codes.

†† S. Kadhe and O. O. Koyluoglu are with the Department of Electrical Engineering and Computer Sciences at University of California Berkeley, USA; emails: {swanand.kadhe, ozan.koyluoglu}@berkeley.edu. A. Heidarzadeh and A. Sprintson are with the Department of Electrical and Computer Engineering at Texas A&M University, USA; emails:{anoosheh, spalex}@tamu.edu. This work is supported in part by National Science Foundation grants CCF-1748585 and CNS-1748692. This material is based upon work supported while Alex Sprintson was serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

I Introduction

The Private Information Retrieval (PIR) problem is one of the important problems in theoretical computer science [1]. The setting of the problem includes a client that needs to retrieve a message belonging to a database with copies stored on a single or multiple remote servers. The message needs to be retrieved by satisfying the privacy condition, which prevents the server from identifying the index of the retrieved message. The theoretical computer science community has primarily focused on the settings with small message sizes with the objective to minimize the total number of bits uploaded to and downloaded from the server (see [2]).

Starting with the seminal work of Sun and Jafar [3], the multiple-server PIR problem has received a significant attention from the information and coding theory community with breakthrough results in the past few years (see, e.g., [4, 5, 6, 7], and references therein). The information-theoretic approach has focused on a practical setting with large message sizes with the goal to minimize the ratio of the total number of downloaded bits to the message size.

Recently, Kadhe et al. [8, 9] considered the single-server PIR with Side Information (PIR-SI) problem, wherein the user knows a random subset of messages that is unknown to the server. It was shown that the side information enables the user to substantially reduce the download cost and still achieve information-theoretic privacy for the requested message. The multi-message extension of PIR-SI, which enables a user to privately download multiple messages from the server, is considered by Heidarzadeh et al. [10] as well as Li and Gastpar [11].

It is well-known in the theoretical computer science community that there is a strong relationship between PIR schemes and a class of error-correcting codes called locally decodable codes (LDCs) (see, e.g., the surveys [2, 12]). LDCs allow one to locally decode an arbitrary message symbol from only a small subset of randomly chosen codeword symbols, even after a fraction of codeword symbols are corrupted by an adversary.

Continuing with this theme, in this paper, we show that single-server PIR-SI schemes are closely related to another class of codes with locality called locally recoverable codes (LRCs) [13]. LRCs are a class of erasure codes that enable one to recover an erased codeword symbol from only a small subset of other codeword symbols.

In particular, in an LRC with block-length $n$ and locality $r$ , every codeword symbol can be reconstructed from at most $r$ other codeword symbols [13]. Rawat et al. [14, 15] extended the notion of local recovery to cooperative local recovery. Specifically, in an LRC with block-length $n$ and $(r,\ell)$ -cooperative locality, every subset of $\ell$ codeword symbols can be reconstructed from at most $r$ other codeword symbols.

In this paper, we show that single-message PIR-SI schemes are related to LRCs, whereas multi-message PIR-SI schemes are related to cooperative LRCs. Detailed contributions are outlined in the following.

Our Contributions: We focus out attention to the single- server PIR-SI problem in which a user wishes to download $D$ messages from a database of $K$ messages (over a finite field $\mathbb{F}_{q}$ ), stored on a single remote server. The user has a random subset of $M$ messages, referred to as side information, whose identities are unknown to the server.

First, we focus on the scalar-linear case wherein the answer from the server is of the form $EX$ , where $X=[X_{1}\>\cdots\>X_{K}]^{T}\in\mathbb{F}_{q}^{K}$ denotes the set of messages, and $E$ is a $T\times K$ matrix with entries over $\mathbb{F}_{q}$ . When the user wishes to protect only the identities of the requested messages, we show the following results:

•

Equivalence between single-message $(D=1)$ PIR with Side Information (SM-PIR-SI) schemes and LRCs:

Any solution $E$ to an SM-PIR-SI problem is a parity check matrix of an LRC with block-length $K$ and locality $M$ (Theorem 1). 2. 2.

Given a parity check matrix $H$ of an LRC with block-length $K$ and locality $M$ , it is possible to construct an SM-PIR-SI scheme where $E$ is a column-permutation of $H$ (Theorem 2).

•

Equivalence between multi-message $(D\geq 2)$ PIR with Side Information (MM-PIR-SI) schemes and cooperative LRCs:

Any solution $E$ to a MM-PIR-SI problem is a parity check matrix of an LRC with block-length $K$ and $(M,D)$ -cooperative locality (Theorem 3). 2. 2.

Given a parity check matrix $H$ of an LRC with block-length $K$ and $(M,D)$ -cooperative locality, it is possible to construct an MM-PIR-SI scheme where $E$ is a column-permutation of $H$ (Theorem 4).

•

As corollaries to Theorems 1 and 3, we derive upper bounds on the download rates for SM-PIR-SI problem (Corollary 1) and MM-PIR-SI problem (Corollary 3), respectively. In addition, as a corollary to Theorem 4, we derive a novel tight upper bound on the rate of a cooperative LRC for the regime $\ell>r$ (see Corollary 4 and Remark 2).

Next, we consider the case when the user wants to protect both the identities of the requested messages and that of the side-information, referred to as $(W,S)$ -PIR-SI.111Here, $W$ denotes the demand index set and $S$ denotes the side information index set. We use the term $(W,S)$ -PIR-SI to reflect the fact that the user wants to protect $(W,S)$ jointly. We show the following equivalence result:

•

Equivalence between $(W,S)$ -PIR-SI schemes and maximum distance separable (MDS) codes222An MDS code can be considered as an LRC with locality $r=k$ .:

Any solution $E$ to a $(W,S)$ -PIR-SI problem is a parity check matrix of an MDS code with block-length $K$ and dimension $M$ (Theorem 5). 2. 2.

Given a parity check matrix $H$ of an MDS code with block-length $K$ and dimension $M$ , it is possible to construct a $(W,S)$ -PIR-SI scheme where $E=H$ (Theorem 6).

Finally, we lift the restriction of scalar-linear solutions, and consider generic (non-linear) SM-PIR-SI schemes. We show the following equivalence result:

•

Equivalence between SM-PIR-SI schemes and LRCs with maximum possible size333It is possible to show that any LRC over $\mathbb{F}_{q}$ with block-length $n$ and locality $r$ can contain at most $q^{n-\lceil n/(r+1)\rceil}$ codewords (see Proposition 2). Any LRC with $q^{n-\lceil n/(r+1)\rceil}$ codewords is said to be an LRC code with maximum possible size.:

Given a solution to an SM-PIR-SI problem, it is possible to construct an LRC with block-length $K$ and locality $M$ (Theorem 7). 2. 2.

Given an LRC with block-length $K$ and locality $M$ with the maximum possible size, it is possible to construct an SM-PIR-SI scheme (Theorem 8).

II Preliminaries

Notation: For a positive integer $K$ , denote $\{1,\dots,K\}$ by $[K]$ . Let $\mathbb{F}_{q}$ denote the finite field of order $q$ , where $q$ is a power of a prime. For a set $\{X_{1},\dots,X_{K}\}$ and a subset $S\subset{[K]}$ , let $X_{S}=\{X_{j}:j\in S\}$ . For a positive integer $P$ , let $\mathbf{1}_{P}$ and $\mathbf{0}_{P}$ , respectively, denote the all-one and all-zero row vectors of length $P$ . Let $e_{j}$ be a unit vector of length $K$ such that its $j$ -th entry is $1$ and the other entries are [math]. For a set $W=\{W_{1},W_{2},\ldots,W_{D}\}\subseteq[K]$ , let

[TABLE]

For a $T\times K$ matrix $E\in\mathbb{F}_{q}^{T\times K}$ , let $\langle E\rangle$ denote the row-space of $E$ . For a subset $S\subset[K]$ , let $E_{S}$ denote the $T\times|S|$ submatrix consisting of columns of $E$ indexed by $S$ . For a vector $v$ , let $\textrm{Supp}\left(v\right)$ denote the support of $\mathbf{v}$ . For a subspace $\mathcal{C}\subset\mathbb{F}_{q}^{K}$ , let ${\mathcal{C}^{\perp}}$ be its dual subspace.

II-A Single-Server PIR with Side Information

We briefly overview the single-server PIR with side information problem [8, 16] (see also [9]). Consider a server containing a database that consists of a set of $K$ messages $\boldsymbol{X}=[\boldsymbol{X}_{1}\>\cdots\>\boldsymbol{X}_{K}]^{T}$ , with each message being independently and uniformly distributed over $\mathbb{F}_{q}$ . A user is interested in privately downloading $D$ ( $1\leq D\leq K$ ) messages $\boldsymbol{X}_{W}$ from the server for some $W\subseteq[K]$ , $|W|=D$ . We refer to $W$ as the demand index set and $\boldsymbol{X}_{W}$ as the demand. The user has the knowledge of a subset $\boldsymbol{X}_{S}$ of the messages for some $S\subset[K]\setminus W$ , $|S|=M$ , $M\leq K-D$ . We refer to $S$ as the side information index set and $\boldsymbol{X}_{S}$ as the side information.

Let $\boldsymbol{W}$ and $\boldsymbol{S}$ denote the random variables corresponding to the demand and side information index sets, respectively. We assume that the side information index set $\boldsymbol{S}$ is distributed uniformly over over all subsets of $[K]$ of size $M$ , i.e.,

[TABLE]

Further, we assume that the demand index set $\boldsymbol{W}$ has the following conditional distribution given $S$ :

[TABLE]

We assume that the server does not know the side information realization at the user and only knows the a priori distributions $p_{\boldsymbol{S}}(S)$ and $p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)$ .

To download the set of messages $\boldsymbol{X}_{W}$ given the side information $\boldsymbol{X}_{S}$ , the user sends a query $Q^{[W,S]}$ to the server. The server responds to the query it receives with an answer $A^{[W,S]}$ over $\mathbb{F}_{q}^{T}$ . Let $\boldsymbol{Q}^{\left[W,S\right]}$ and $\boldsymbol{A}^{\left[W,S\right]}$ be the corresponding random variables.

Definition 1.

[PIR-SI] Any scheme consisting of a query and an answer is referred to as the PIR with side information (PIR-SI) scheme if the query and answer satisfy the following two conditions.

$W$ -privacy: The server cannot infer any information about the demand index set from the query it receives i.e.,

[TABLE]

2.

$(W,S)$ *-privacy: The server cannot infer any information about the demand index set as well as the side information index set from the query it receives i.e., *

[TABLE]

3.

Recoverability:* From the answer $A^{[W,S]}$ and the side information $X_{S}$ , the user should be able to decode the desired set of messages $X_{W}$ for any $(W,S)$ , i.e.,*

[TABLE]

We refer to the case of $D=1$ as single-message PIR-SI, while the case of $D\geq 2$ as multi-message PIR-SI.

The rate of a PIR-SI scheme is defined as the ratio of the message length ( $\log q$ bits) to the total length of the answers (in bits) as follows:444We focus our attention to the download rate similar to [3]. This is because the download rate dominates the total communication rate when the message size is sufficiently large as compared to the size of a query.

[TABLE]

The capacity of $W$ -PIR-SI, denoted by $C_{W}$ , is defined as the supremum of rates over all $W$ -PIR-SI schemes for a given $K$ and $M$ .

II-B Locally Recoverable Codes

Let $\mathcal{C}$ denote a linear $[n,k,d]_{q}$ code over $\mathbb{F}_{q}$ with block-length $n$ , dimension $k$ , and minimum distance $d$ . For any codeword $\mathbf{c}\in\mathcal{C}$ , $\mathbf{c}_{i}$ is said to be the $i$ -th symbol of the codeword $\mathbf{c}$ .

We say that the $i$ -th symbol of a code $\mathcal{C}$ has locality $r$ if its value can be recovered from some other $r$ symbols of $\mathcal{C}$ . The formal definition of locality is as follows (see [13]).

Definition 2.

[Locality] We say that the $i$ -th coordinate of a code $\mathcal{C}$ has locality $r$ if there exists a set ${R}\left(i\right)\subset[n]\setminus\{i\}$ , $|{R}\left(i\right)|\leq r$ , such that, for every codeword $\mathbf{c}\in\mathcal{C}$ , $\mathbf{c}_{i}=\sum_{l\in{R}\left(i\right)}\lambda_{l}\mathbf{c}_{l}$ , where $\lambda_{l}\in\mathbb{F}_{q}\setminus\{0\},$ $\forall\>l\in{R}\left(i\right)$ . We say that ${R}\left(i\right)$ is a repair group of the $i$ -th coordinate and define $\Gamma\left(i\right)=\{\mathbf{c}_{i}\cup{R}\left(i\right)\}$ .

We say that an $[n,k,d]_{q}$ code has (all-symbol) locality $r$ if each of its $n$ coordinates has locality $r$ . An LRC with these parameters is referred to as an $(n,k,r)$ LRC.

Equivalently, we say that the coordinate $i$ has locality $r$ , if the dual code $\mathcal{C}^{\perp}$ contains a codeword $\mathbf{c}^{\prime}$ of Hamming weight at most $r+1$ such that the $i$ -th coordinate is in the support of $\mathbf{c}^{\prime}$ .

Example 1.

Let us consider a $(7,3)$ Simplex code $\mathcal{C}$ , which is a dual of a $(7,4)$ Hamming code. In particular, $\mathcal{C}$ encodes three information symbols $\{a,b,c\}$ into seven symbols as $\{a,b,c,a+b,a+c,b+c,a+b+c\}$ . It is easy to see that any symbol can be recovered from two other symbols. For instance, $a$ can be recovered from $b+c$ and $a+b+c$ .555In fact, every symbol of the $(7,3)$ simplex code has three disjoint repair groups [17]. Further, note that, even though the $(7,3)$ simplex code is not optimal with respect to the distance upper bound in (7), it is optimal with respect to a field size dependent rate upper bound established in [17].

In [13], it is shown that the minimum distance $d_{min}\left(\mathcal{C}\right)$ of an $(n,k,r)$ LRC $\mathcal{C}$ is upper bounded as

[TABLE]

Further, the authors of prove that any systematic code with locality for information symbols that achieves equality in (7) must follow a specific structure [13]. We state below the structure theorem [13, Theorem 9], adapted to the form useful for our setup.

Proposition 1.

[13]** Let $\mathcal{C}$ be an $(n,k,r)$ code, where $r\mid k$ , $r<k$ , and $n=k+k/r$ . Then, for any $i,j\in[n]$ , $i\neq j$ , we have either $\Gamma\left(i\right)=\Gamma\left(j\right)$ or $\Gamma\left(i\right)\cap\Gamma\left(j\right)=\emptyset$ .

II-C Cooperative Locally Recoverable Codes

Let $\mathcal{C}$ denote a linear $[n,k,d]_{q}$ code over $\mathbb{F}_{q}$ with block-length $n$ , dimension $k$ , and minimum distance $d$ . We say that the code has $(r,\ell)$ -cooperative locality if for every codeword, it is possible to repair any $\ell$ symbols from at most $r$ other symbols. The formal definition is as follows (see [14]).

Definition 3.

We say that an $[n,k,d]$ code $\mathcal{C}$ has $(r,\ell)$ -cooperative locality, if for any subset of $\ell$ coordinates $\Delta\subset[n]$ , $|\Delta|=\ell$ , there exists a set $\Gamma(\Delta)\subset[n]$ satisfying $\Delta\cap\Gamma(\Delta)=\emptyset$ , $|\Gamma(\Delta)|\leq r$ , such that, for every codeword $\mathbf{c}\in\mathcal{C}$ , the symbols $\mathbf{c}_{\Delta}$ can be recovered using the symbols $\mathbf{c}_{\Gamma(\Delta)}$ .

An LRC with these parameters is referred to as an $(n,k,r,\ell)$ cooperative LRC. Note that when $\ell=n-k$ and $r=k$ , then the above definition coincides with that of an MDS code.

In [15], it is shown that the minimum distance $d_{min}\left(\mathcal{C}\right)$ of an $(n,k,r,\ell)$ cooperative LRC $\mathcal{C}$ for $r\geq\ell$ is upper bounded as

[TABLE]

III Equivalence Results for Scalar-Linear Schemes

In this section, we consider non-interactive (single round), scalar-linear PIR-SI schemes. In particular, for any given query $Q^{[W,S]}$ , the answer $\boldsymbol{A}^{\left[W,S\right]}$ can be specified as

[TABLE]

where the matrix $E\in\mathbb{F}_{q}^{T\times K}$ depends on $Q^{[W,S]}$ . We refer to $E$ as a solution to the PIR-SI problem. Note that $T$ , the number of rows of $E$ , denotes the number of symbols downloaded from the server.

III-A Single-Message PIR-SI Schemes and LRCs

In this section, we show that a single-message PIR-SI scheme is equivalent to a locally recoverable code (LRC). In particular, we show that any solution to the single-message PIR-SI problem (SM-PIR-SI) must be a parity check matrix of an LRC. Furthermore, we show that it is possible to construct a solution to the SM-PIR-SI problem using a parity check matrix of an LRC.

First, we establish the relation from a solution of the SM-PIR-SI problem to a parity check matrix of an LRC.

Theorem 1.

Any scalar-linear solution $E$ to the single-message PIR-SI problem must be a parity check matrix of an LRC with block length $K$ and locality $M$ .

Proof:

First, we note that the following necessary condition is imposed by the privacy and recoverability conditions. For any query $Q^{[W,S]}$ , the answer $E$ should satisfy the following necessary condition: for any candidate demand index $W^{\prime}\in[K]$ , there must exist a potential side information index set $S^{\prime}\subseteq[K]\setminus W^{\prime}$ , $|S^{\prime}|\leq M$ such that it is possible to recover $W^{\prime}$ from $EX$ and $X_{S^{\prime}}$ . In other words, the following condition must hold:

[TABLE]

If the aforementioned necessary condition does not hold, then the server will learn from $E$ that $W^{\prime}$ is not the user’s demand index. Indeed, since $E$ is the solution corresponding to the query $Q^{[W,S]}$ , we have

[TABLE]

which, in turn, implies that $I\left(\boldsymbol{W};\boldsymbol{Q}^{\left[\boldsymbol{W},\boldsymbol{S}\right]}\right)>0$ . This violates the $W$ -privacy condition (3).

The above condition (10) implies that for every $W^{\prime}\in[K]$ , $\left\langle E\right\rangle$ must contain a vector $\mathbf{v}$ of Hamming weight at most $M+1$ such that $W^{\prime}\in\textrm{Supp}\left(\mathbf{v}\right)$ . According to Definition 2, $\left\langle E\right\rangle^{\perp}$ is an LRC with block-length $K$ and all-symbol locality $M$ . ∎

Theorem 1 has the following two immediate implications. First, it allows us to construct a class of LRCs using solutions to the SM-PIR-SI problem. More specifically, given a solution $E$ to the SM-PIR-SI problem with $K$ messages and side information size $M$ , one can easily obtain an LRC with block-length $K$ and locality $M$ as $\mathcal{C}=\left\langle E\right\rangle^{\perp}$ .

Now, consider the Partition-and-Code scheme proposed in [9] for the SM-PIR-SI problem. Let $K={\alpha(M+1)+\beta}$ for some $\alpha>0$ and $0\leq\beta<M+1$ . In the P&C scheme, the user first randomly partitions the $K$ messages into $(\alpha+1)$ subsets, each of size at most $M+1$ , such that one of the subsets is $W\cup S^{\prime}$ for some $S^{\prime}\subseteq S$ . The user then asks the server to send the sum of messages in each subset, resulting in the download cost of $\alpha+1$ symbols.

Note that the Partition-and-Code scheme yields a solution $E$ of size ${(\alpha+1)\times K}$ with the following form (up to column permutation):

[TABLE]

It is easy to verify that the corresponding LRC $\mathcal{C}=\left\langle E\right\rangle^{\perp}$ is a direct-sum of $\alpha+1$ single-parity check codes, each of length at most $M+1$ . In other words, $\mathcal{C}$ is a simple LRC that partitions the message symbols into $\alpha+1$ subsets each of size at most $M+1$ , and adds a parity check symbol for each subset.

Second, Theorem 1 enables us to use (7) to obtain an upper bound on the capacity of a (scalar-linear) single-message PIR-SI scheme. As we show next, the bound coincides with the upper bound derived in [8, 9].

Corollary 1.

The scalar-linear capacity of the single-message PIR-SI problem is upper bounded by $\lceil K/(M+1)\rceil^{-1}$ .

Proof:

Let $E$ be a scalar-linear solution to the SM-PIR-SI problem. Let $\mathcal{C}=\left\langle E\right\rangle^{\perp}$ . Suppose the minimum distance of $\mathcal{C}$ is $d$ . Note that we must have $d\geq 2$ . For, if $d=1$ , $E$ must contain a column of all zeros. Let $W^{\prime}$ denote the index of this all-zero column. However, this implies that $X_{W^{\prime}}$ cannot be the demand, and this will violate the privacy.666Note that here we are using the same argument as in the proof of Theorem 1 (cf. (16)). Now, since $\left\langle E\right\rangle^{\perp}$ is an LRC with block-length $n=K$ , dimension $k=K-T$ , and locality $r=M$ from Theorem 1, we have from (7) that

[TABLE]

After re-arranging, and noting that $d\geq 2$ and $T$ is an integer, we get

[TABLE]

As the messages are independent and uniformly distributed over $\mathbb{F}_{q}$ , we have $H\left(\boldsymbol{A}^{\left[W,S\right]}\right)=T\log q$ . The result then follows from (6). ∎

Remark 1.

The above result can be directly proved using an upper bound on the rate of an LRC with locality $r$ given as $r/(r+1)$ (see [18, Theorem 1]). It is interesting to note that [18, Theorem 1] uses an argument based on acyclic induced subgraphs similar to [8, 9].

We say that a scalar-linear solution to SM-PIR-SI problem is an optimal solution, if $T=\lceil K/(M+1)\rceil$ . Then, Proposition 1 implies the following structure on any optimal scalar-linear solution.

Corollary 2.

When $(M+1)\mid K$ , any optimal scalar-linear solution $E$ to the PIR-SI problem can be converted to the following form using elementary row operations and column permutations:

[TABLE]

where $\times$ can be any non-zero element in $\mathbb{F}_{q}$ , i.e., $\times\in\mathbb{F}_{q}\setminus\{0\}$ , and the number of non-zero entries in each row is exactly $M+1$ .

Since the solution obtained using the partition-and-code scheme (cf. (12)) has the same form as (13), this shows the uniqueness of the solution obtained by the partition-and-code scheme. In other words, any optimal scalar-linear solution can be obtained from the partition-and-code solution using elementary row operations and column permutations.

Next, we establish the relation from a parity check matrix of an LRC to a solution of the SM-PIR-SI problem.

Theorem 2.

Let $H$ be a parity check matrix of an LRC with block length $K$ and locality $M$ . Then, it is possible to construct a single-message PIR-SI scheme, such that the solution $E$ is a column-permutation of $H$ .

Proof:

We present a constructive proof. In the rest of the proof, we consider all sets as ordered sets (with a natural ascending order). For a given $W$ and $S$ , the user first finds a permutation $\pi$ on $[K]$ as follows. Choose an index $W^{\prime}$ uniformly at random from $[K]$ , independent of $W$ and $S$ . Let $R(W^{\prime})$ be a repair group of $W^{\prime}$ . If a coordinate has multiple repair groups, arbitrarily choose one repair group.777This arbitrary choice of a repair group for each coordinate is made a priori, and are known to the server as a part of the scheme. By the definition of locality, we have $|R(W^{\prime})|\leq M$ . For simplicity, we assume that every repair group of any symbol is of size $M$ .888The arguments can be easily generalized to the case when some repair groups are smaller than $M$ . Let $R^{\prime}(W^{\prime})$ be a random permutation of $R(W^{\prime})$ . Let $P=[K]\setminus\{W\cup S\}$ , and $P^{\prime}$ be a random permutation of $[K]\setminus\{W^{\prime}\cup R(W^{\prime})\}$ . Let $\pi$ be the permutation that maps $W$ to $W^{\prime}$ , $S$ to $R^{\prime}(W^{\prime})$ , and $P$ to $P^{\prime}$ . The user sends $\pi$ as its query $Q^{[W,S]}$ . The server then applies $\pi$ to the columns of $H$ to obtain $E$ , i.e., $E_{i}=H_{\pi(i)}$ for each $i\in[K]$ , where $H_{j}$ is the $j$ th column of $H$ . Then, the server computes the answer as $EX$ .

Next, we show that the above scheme satisfies the recoverablity and $W$ -privacy conditions. Indeed, by the definition of locality for $W^{\prime}$ , $\left\langle H\right\rangle$ contains a vector whose support is $W^{\prime}\cup R(W^{\prime})$ . Therefore, by the construction of $E$ , $\left\langle E\right\rangle$ contains a vector whose support is $W\cup S$ . Hence, the recoverability condition in (5) is satisfied.

For the $W$ -privacy, it suffices to show that, for any $W\in[K]$ and any permutation $\pi$ ,

[TABLE]

This is because using (14), it is easy to show that $\mathbb{P}\left(\boldsymbol{W}=W\mid\boldsymbol{Q}^{\left[\boldsymbol{W},\boldsymbol{S}\right]}=\pi\right)=\mathbb{P}\left(\boldsymbol{W}=W\right)$ , from which the privacy condition (3) follows.

Now, we give a proof of (14). Observe that the query generation process first maps the demand index to a random index in $[K]$ . Let $\boldsymbol{W}^{\prime}$ denote that random index. Let $\boldsymbol{R}^{\prime}(\boldsymbol{W}^{\prime})$ and $\boldsymbol{P}^{\prime}$ be random variables corresponding to (independent) uniform random permutations of $R(\boldsymbol{W}^{\prime})$ and $[K]\setminus\{\boldsymbol{W}^{\prime}\cup R(\boldsymbol{W}^{\prime})\}$ , respectively. Now, given a permutation $\pi$ on $[K]$ as a query, define the following events:

[TABLE]

Then, for any $W\in[K]$ and a permutation $\pi$ on $[K]$ , the probability of choosing $\pi$ as a query can be written as

[TABLE]

where (a) follows from the query generation procedure, and (b) uses (1) and (2) to compute $\mathbb{P}\left(\boldsymbol{E}_{2}\mid\boldsymbol{E}_{1},\boldsymbol{W}=W\right)$ . This completes the proof of (14), and concludes the proof. ∎

III-B Multi-Message PIR-SI and Cooperative LRCs

In this section, we show that a multi-message PIR-SI scheme is a dual of a cooperative LRC, introduced in [14].

First, we show that any solution to the multi-message PIR-SI problem should be a parity check matrix of a code with cooperative locality.

Theorem 3.

Any scalar-linear solution $E$ to the multi-message PIR-SI problem with a demand set of size $D$ and a side information set of size $M$ must be a parity check matrix of an LRC with block length $K$ and $(M,D)$ -cooperative locality.

Proof:

First, we note that the following necessary condition is imposed by the privacy and recoverability conditions. For any query $Q^{[W,S]}$ , the answer $E$ should satisfy the following necessary condition: for every candidate demand index set $W^{\prime}\in[K]$ , $|W^{\prime}|=D$ , there must exist a potential side information index set $S^{\prime}\subseteq[K]\setminus W^{\prime}$ , $|S^{\prime}|\leq M$ such that it is possible to recover $X_{W^{\prime}}$ from $EX$ and $X_{S^{\prime}}$ . In other words, the following condition must hold:

[TABLE]

If the aforementioned necessary condition does not hold, then the server will learn from $E$ that $W^{\prime}$ is not the user’s demand index. Since $E$ is the solution corresponding to the query $Q^{[W,S]}$ , we have

[TABLE]

which, in turn, implies that $I\left(\boldsymbol{W};\boldsymbol{Q}^{\left[\boldsymbol{W},\boldsymbol{S}\right]}\right)>0$ . This violates the $W$ -privacy condition (3). This violates the privacy condition (3).

The above condition (15) implies that for every subset $W^{\prime}=\{i_{1},i_{2},\ldots,i_{D}\}\subseteq[K]$ of size $D$ , $\left\langle E\right\rangle$ must contain $D$ vectors $v_{1},v_{2},\ldots,v_{D}$ such that $|\cup_{j=1}^{D}\textrm{Supp}\left(v_{j}\right)|\leq D+M$ , and for each $1\leq j\leq D$ , $\textrm{Supp}\left(v_{j}\right)\cap W^{\prime}=\{i_{j}\}$ . It is easy to verify from Definition 3 that $\left\langle E\right\rangle^{\perp}$ is an $(M,D)$ cooperative LRC with block-length $K$ . ∎

Corollary 3.

For $M\geq D$ , the scalar-linear capacity of the multi-message PIR-SI problem is upper bounded by ${D}/{\lceil DK/(M+D)\rceil}$ .

Proof:

Let $\mathcal{C}=\left\langle E\right\rangle^{\perp}$ . Note that from Theorem 3, $\mathcal{C}$ must be a code with blocklength $K$ and $(M,D)$ -cooperative locality. Using (8), it is shown in [15, Corollary 1] that the rate of a code with $(M,D)$ -cooperative locality for $M\geq D$ is upper bounded as $M/(M+D)$ . Therefore, we have $T/K\geq 1-M/(M+D)$ . This yields $T\geq\lceil DK/(D+M)\rceil$ , which gives the capacity upper bound. ∎

Next, we show that it is possible to construct a solution to the multi-message PIR-SI problem using a parity check matrix of a cooperative locality code.

Theorem 4.

Let $H$ be a parity check matrix of an LRC with block-length $K$ and $(D,M)$ -cooperative locality. Then, it is possible to construct a multi-message PIR-SI scheme, such that the solution $E$ is a column-permutation of $H$ .

Proof:

The query generation process and the rest of the proof is similar to the proof of Theorem 1. ∎

Corollary 4.

For $\ell>r$ , the rate of a linear $(n,k,r,\ell)$ cooperative LRC is upper bounded by $r/n$ .

Proof:

Let $H$ be a parity check matrix of an $(n,k,r,\ell)$ cooperative LRC. From Theorem 3, $H$ is a solution (up to a column-permutation) of a multi-message PIR-SI problem such that $K=n$ , $M=r$ , and $D=\ell$ . Now, in [16, Lemma 1], it is shown that, when $D>M$ , the number of transmissions in any multi-message PIR-SI scheme is at least $K-M$ . Therefore, we have $n-k\geq n-r$ , from which the result follows. ∎

Remark 2.

Corollary 4 yields a better bound on the rate of a cooperative LRC for $\ell>r$ than [15, Corollary 1] given as $r/(r+\ell)+\ell^{2}/(nr)$ . In fact, the rate bound is tight for $n>2r$ . This is because an $(n,r)$ MDS code trivially has $(r,\ell)$ -cooperative locality for any $\ell\geq r$ .

Theorem 3 also enables us to obtain computationally efficient multi-message PIR-SI solutions. In particular, for ${D\leq M}$ , the schemes in [16] (see also [19]) rely on generalized Reed-Solomon codes, and thus, require a finite field size at least $M+\lceil M/D\rceil$ . On the other hand, it is possible to use constructions of cooperative LRCs to obtain PIR-SI schemes over smaller field size.999Note that small field size schemes obtained from cooperative LRCs may have smaller download rate than those in [16, 19]. As an example, an $(n=2^{k}-1,k)$ simplex code has $(\ell+1,\ell)$ -cooperative locality for any $1\leq\ell\leq(n-1)/2$ (see [15]). Thus, it is possible to obtain multi-message PIR-SI solutions over the binary field when $K=2^{t}-1$ for a positive integer $t$ , $1\leq D\leq(K-1)/2$ , and $M=D+1$ .

III-C $(W,S)$ -Private PIR-SI Schemes and MDS Codes

In this section, we show an equivalence between a solution to the $(W,S)$ -PIR-SI problem and a maximum distance separable (MDS) code.

First, we establish the relation from a solution of the $(W,S)$ -PIR-SI problem to a parity check matrix of an MDS code.

Theorem 5.

Any scalar-linear solution $E$ to the $(W,S)$ -PIR-SI problem must be a parity check matrix of a $(K,M)$ MDS code.

Proof:

First, we note that the $(W,S)$ -privacy condition implies the following necessary condition: for each message $X_{i}$ and every set $S_{i}\subseteq[K]\setminus\{i\}$ of size $M$ , it is possible to recover $X_{i}$ from $EX$ and $X_{S_{i}}$ . If this is not the case, then the server learns that the user cannot possess $X_{S_{i}}$ and demand any $X_{W}$ such that $i\in W$ . Indeed, since $E$ is the solution corresponding to the query $Q^{[W,S]}$ , we have

[TABLE]

which, in turn, implies that $I\left(\boldsymbol{W},\boldsymbol{S};\boldsymbol{Q}^{\left[\boldsymbol{W},\boldsymbol{S}\right]}\right)>0$ . This violates the $(W,S)$ -privacy condition (4).

The aforementioned necessary condition implies that, for any set $S\subset[K]$ of size $M$ , for every $i\in[K]\setminus S$ , we should have

[TABLE]

Equation (17), in turn, implies that the columns of $E$ in $[K]\setminus S$ must be linearly independent. Since this should hold for each subset $S\subset[K]$ of size $M$ , we have that every subset of columns of $E$ of size $K-M$ are linearly independent. Thus, $E$ must be a parity check matrix of a $(K,M)$ MDS code. ∎

Next, we establish a relation from a parity check matrix of an MDS code to a solution of the $(W,S)$ -PIR-SI problem. It is worth noting that the achievability schemes in [9, 16] for $(W,S)$ -privacy are based on MDS codes.

Theorem 6.

Let $H$ be a parity check matrix of a $(K,M)$ -MDS code. Then, $E=H$ is a solution to the $(W,S)$ -PIR-SI problem.

Proof:

First, note that the scheme with $E=H$ is private, since the solution is independent of the particular realization of $\boldsymbol{W}$ and $\boldsymbol{S}$ . As the server already knows the size of the side information index set, it does not get any other information about $\boldsymbol{W}$ and $\boldsymbol{S}$ from $E$ .

To see the recoverability, note that any $K-M$ columns of $H$ are linearly independent. Thus, given the side information $X_{S}$ for any $S\subset[K]$ of size $M$ , the user can recover all the messages $X_{i}$ , $i\in[K]\setminus S$ , including the demand message(s) $X_{W}$ . ∎

IV Equivalence Results for Non-Linear Schemes

In this section, we consider generic PIR-SI schemes and LRCs, which encompass scalar-linear, vector-linear, and non-linear schemes. We begin with the definition of a generic LRC.

Definition 4.

An $(n,k,r)$ LRC $\mathcal{C}\subseteq\mathbb{F}_{q}^{n}$ is a set of vectors in $\mathbb{F}_{q}^{n}$ of size $q^{k}$ , referred to as codewords, together with

an encoding function $f:\mathbb{F}_{q}^{k}\rightarrow\mathcal{C}$ , which is a bijection between vectors in $\mathbb{F}_{q}^{k}$ and codewords in $\mathcal{C}$ , and 2. 2.

*a set of deterministic repair functions $g_{1},g_{2},\ldots,g_{n}$ , $g_{i}:\mathbb{F}_{q}^{r}\rightarrow\mathbb{F}_{q}$ , such that, for every coordinate $i\in[n]$ , there exists a set of coordinates $R(i)\subset[n]\setminus\{i\}$ , $|R(i)|=r$ satisfying $g_{i}(\mathbf{c}_{R(i)})=\mathbf{c}_{i}$ for every codeword $\mathbf{c}\in\mathcal{C}$ . We say that $R(i)$ is a repair group of the $i$ -th coordinate. *

Next, for the SM-PIR-SI problem, we define a PIR-SI code. Towards this end, we introduce the following notation:

[TABLE]

That is, $\mathcal{W}$ is the set of all possible combinations of the demand index and the side information index set.

Definition 5.

A PIR-SI code for $\mathbb{F}_{q}^{K}$ is a set of vectors in $\mathbb{F}_{q}^{T}$ , referred to as codewords, together with

a class of deterministic answer functions $\mathcal{A}$ , where each function $A\in\mathcal{A}$ maps vectors from $\mathbb{F}_{q}^{K}$ to the codewords, i.e., $A:\mathbb{F}_{q}^{K}\rightarrow\mathbb{F}_{q}^{T}$ , 2. 2.

a class of deterministic recovery functions $\mathcal{D}$ , where each function $D\in\mathcal{D}$ is from $\mathbb{F}_{q}^{T+M}$ to $\mathbb{F}_{q}$ , and 3. 3.

a stochastic query function $Q:\mathcal{W}\rightarrow\mathcal{A}$ that maps $(W,S)$ to an answer function $A\in\mathcal{A}$ (independently of the value of $X_{S}$ ) such that:

(i)

for every $W^{\prime},W\in[K]$ , $S\subset[K]\setminus\{W\}$ , $|S|=M$ , and for each $A\in\mathcal{A}$ ,

[TABLE]

and

(ii)

there exists a decoding function $D\in\mathcal{D}$ satisfying

[TABLE]

We refer to $T$ as the length of the PIR code.

It is straightforward to show that the $W$ -privacy condition (19) implies the following necessary condition on a PIR code.

Lemma 1.

In a PIR-SI code, for any $A\in\mathcal{A}$ , for every $j\in[K]$ , there must exist a decoding function $D_{j}\in\mathcal{D}$ and a set $S_{j}\subset[K]\setminus\{j\}$ , $|S_{j}|\>=M$ , such that $D_{j}\left(A(X_{1},\cdots,X_{K}),X_{S_{j}}\right)=X_{j}$ .

Now, we show a relation from a PIR-SI code to an LRC. It is worth noting that the proof technique is similar to [20, Lemma 3].

Theorem 7.

Given a PIR-SI code of length $T$ over $\mathbb{F}_{q}$ , it is possible to construct an LRC of size (at least) $q^{K-T}$ .

Proof:

First, note that, for any $A\in\mathcal{A}$ , there must exist a vector $\mathbf{a}\in\mathbb{F}_{q}^{T}$ such that $\left|\left\{X\in\mathbb{F}_{q}^{K}\mid A(X)=\mathbf{a}\right\}\right|\geq q^{K-T}$ . This is because every $A\in\mathcal{A}$ maps $\mathbb{F}_{q}^{K}$ to $\mathbb{F}_{q}^{T}$ . Next, for an arbitrary $A\in\mathcal{A}$ and the corresponding $\mathbf{a}$ , let us define $\mathcal{C}_{\mathbf{a}}=\left\{X\in\mathbb{F}_{q}^{K}\mid A(X)=\mathbf{a}\right\}$ . Now, from Lemma 1, for every $i\in[K]$ , there must exist a deterministic decoding function $D_{i}$ and a set $S_{i}\subset[K]\setminus\{i\}$ , $|S_{i}|=M$ , such that $D_{i}\left(\mathbf{a},X_{S_{i}}\right)=X_{i}$ . Using this, define, for every $i\in[K]$ , $R(i)=S_{i}$ , and $g_{i}\left(\mathbf{c}_{R(i)}\right)=D_{i}\left(\mathbf{a},X_{S_{i}}\right)$ . It is easy to verify that the set $\mathcal{C}_{\mathbf{a}}$ along with with an arbitrary bijection $E:\mathbb{F}_{q}^{\lfloor\log_{q}|\mathcal{C}_{\mathbf{a}}|\rfloor}\rightarrow\mathcal{C}$ and repair functions $g_{1},g_{2},\ldots,g_{K}$ is an LRC of size at least $q^{K-T}$ . ∎

Next, from [18, Theorem 2.1], we have the following upper bound on the size of an $(n,k,r)$ LRC.

Proposition 2.

[18]** For any $(n,k,r)$ LRC $\mathcal{C}\subset\mathbb{F}_{q}^{n}$ , the size $|\mathcal{C}|\leq q^{n-\lceil n/(r+1)\rceil}$ .

We refer to an $(n,k,r)$ LRC $\mathcal{C}$ satisfying the equality $|\mathcal{C}|=q^{n-\lceil n/(r+1)\rceil}$ to be an optimal LRC.

To complete the equivalence, we establish a relation from an optimal LRC to a PIR-SI code.

Theorem 8.

Given an optimal $(K,K-\lceil K/(M+1)\rceil,M)$ LRC, it is possible to construct a PIR-SI code of length $\lceil K/(M+1)\rceil$ over $\mathbb{F}_{q}$ .

In order to prove Theorem 8, we need two other lemmas. To simplify the presentation, we define $T_{OPT}\triangleq\lceil K/(M+1)\rceil$ . Also, for a code $\mathcal{C}$ of block-length $K$ and a set $P\subset[K]$ , let $\mathcal{C}_{P}$ denote the code obtained by puncturing $\mathcal{C}$ on the coordinates outside of $P$ .

First, we show that any optimal LRC must contain $K-T_{OPT}$ coordinates such that values on these coordinates determine the values of the remaining $T_{OPT}$ coordinates. Note that for an arbitrary $(n,k)$ non-linear code, there my not exist any subset of $k$ coordinates that determine values of the remaining coordinates.

Lemma 2.

For an optimal $(K,K-T_{OPT},M)$ LRC $\mathcal{C}$ , there exists a partition of $K$ coordinates into sets $P_{1}$ and $P_{2}$ such that $|P_{1}|=K-T_{OPT}$ , $|P_{2}|=T_{OPT}$ , and for any codeword $\mathbf{c}\in\mathcal{C}$ , the symbols $\mathbf{c}_{P_{2}}$ can be recovered from the symbols $\mathbf{c}_{P_{1}}$ .

Proof:

We iteratively construct $P_{1}$ and $P_{2}$ as follows.

Initialize $P_{1}=P_{2}=\emptyset$

2.

While $|P_{1}\cup P_{2}|<K$ :

2.1

Choose a coordinate $i\not\in P_{1}\cup P_{2}$

2.2

Set $P_{1}\leftarrow P_{1}\cup R(i)$ , for a repair group $R(i)$ of $i$

2.3

Set $P_{2}\leftarrow P_{2}\cup\{i\}$ .

By the construction of $P_{1}$ and $P_{2}$ , the coordinates in $P_{2}$ can be recovered from the coordinates in $P_{1}$ .

Note that, in each step, $P_{2}$ grows by one, and $P_{1}$ grows by at most $M$ as the locality of the code is $M$ . In other words, in each step, $P_{1}\cup P_{2}$ grows by at most $M+1$ . Therefore, the number of steps for which the while loop runs is at least $\lceil K/(M+1)\rceil=T_{OPT}$ . This gives $|P_{2}|\geq T_{OPT}$ .

Next, we show that $|P_{2}|\leq T_{OPT}$ . Since there is a bijection between $\mathbb{F}_{q}^{K-T_{OPT}}$ and $\mathcal{C}$ , and since the coordinates in $P_{2}$ are a function of those in $P_{1}$ , there must be a bijection between $\mathbb{F}_{q}^{K-T_{OPT}}$ and $\mathcal{C}_{P_{1}}$ . This implies that $|P_{1}|\geq K-T_{OPT}$ , and thus, $|P_{2}|\leq T_{OPT}$ .

We conclude that $|P_{2}|=T_{OPT}$ , which completes the proof. ∎

Given a vector $\mathbf{u}$ , we define a translation of an LRC $\mathcal{C}$ as

[TABLE]

Now, using Lemma 2, we show that there exist $q^{T_{OPT}}$ translations of an optimal LRC that partition $\mathbb{F}_{q}^{K}$ .

Lemma 3.

For an optimal $(K,K-\lceil K/(M+1)\rceil,M)$ LRC $\mathcal{C}$ , there exist $q^{T_{OPT}}$ distinct vectors $\mathbf{u}_{j}\in\mathbb{F}_{q}^{K}$ , $j=0,\ldots,q^{T_{OPT}}-1$ , such that the translations $\left\{\mathcal{C}+\mathbf{u}_{j}\mid j=0,\ldots,q^{T_{OPT}}-1\right\}$ partition the space $\mathbb{F}_{q}^{K}$ . That is,

[TABLE]

and

[TABLE]

Proof:

We give a constructive proof. Let $P_{1}$ and $P_{2}$ be the sets of coordinates of $\mathcal{C}$ as described in Lemma 2. Without loss of generality, let $P_{1}$ be the first $K-T_{OPT}$ coordinates. Let $\left\{\mathbf{v}_{i}\mid 0\leq i\leq q^{T_{OPT}}-1\right\}$ denote the set of vectors in $\mathbb{F}_{q}^{T_{OPT}}$ in a lexicographic order. For each $0\leq i\leq q^{T_{OPT}}-1$ , define $\mathbf{u}_{i}=\left[\mathbf{0}\>\>\mathbf{v}_{i}\right]$ , where $\mathbf{0}$ is the all-zero vector of length $K-T^{*}$ .

Note that any translation of $|\mathcal{C}$ has the same size as $\mathcal{C}$ . Thus, to prove (23), it suffices to show (22). We prove this by the way of contradiction. Suppose, for contradiction, that there exists a pair of codewords $\mathbf{c},\mathbf{c}^{\prime}\in\mathcal{C}$ such that $\mathbf{c}+\mathbf{u}_{i}=\mathbf{c}^{\prime}+\mathbf{u}_{j}$ . This implies that

[TABLE]

Therefore, $\mathbf{c}_{P_{1}}=\mathbf{c}^{\prime}_{P_{1}}$ . Further, since the coordinates in $P_{2}$ can be recovered from those in $P_{1}$ (Lemma 2), we must have $\mathbf{c}_{P_{2}}=\mathbf{c}^{\prime}_{P_{2}}$ . However, as $\mathbf{v}_{i}\neq\mathbf{v}_{j}$ , we have a contradiction to (24). ∎

Proof of Theorem 8: Lemma 3 enables us to construct a PIR-SI code of length $T_{OPT}$ over $\mathbb{F}_{q}$ using an optimal LRC $\mathcal{C}$ as follows.

Answer functions: We construct a set $\mathcal{A}$ of $K!$ answer functions, and associate every answer function with a permutation on $[K]$ . Towards this end, we need the following additional notation. For $0\leq a\leq q^{T_{OPT}}-1$ , let $\bar{a}_{q}$ denote the length- $T_{OPT}$ $q$ -ary expansion of $a$ . For a permutation $\pi$ on $[K]$ and a vector $[X_{1}\cdots X_{K}]\in\mathbb{F}_{q}^{K}$ , let $\pi(X)=X_{\pi([K])}$

Let $\mathcal{U}=\{\mathbf{u}_{j}\in\mathbb{F}_{q}^{K},j=0,\ldots,q^{T_{OPT}}-1\}$ be a set of vectors as described in Lemma 3. For a given $X\in\mathbb{F}_{q}^{K}$ and a permutation $\pi$ on $[K]$ , let $0\leq a\leq q^{T_{OPT}}-1$ be such that $\pi(X)\in\mathcal{C}+\mathbf{u}_{a}$ . Note that, by Lemma 3, the translations $\left\{\mathcal{C}+\mathbf{u}_{j}\mid 0\leq j\leq q^{T_{OPT}}-1\right\}$ partition the space $\mathbb{F}_{q}^{K}$ . Hence, there exists a unique such $\mathbf{u}_{a}\in\mathcal{U}$ for every $X\in\mathbb{F}_{q}^{K}$ and any permutation $\pi$ on $[K]$ . Define the answer functions for every $X\in\mathbb{F}_{q}^{K}$ and every permutation $\pi$ on $[K]$ as

[TABLE]

Query function: We are given an index $W\in[K]$ and a set $S\subset[K]\setminus\{W\}$ . First, choose an index $W^{\prime}\in[K]$ uniformly at random independent of $W$ and $S$ . Choose an arbitrary repair group of $W^{\prime}$ , say $R(W^{\prime})$ .101010If a coordinate has multiple repair groups, arbitrarily choose one repair group. This arbitrary choice of a repair group for each coordinate is made a priori, and are known to the server as a part of the scheme. Let $P=[K]\setminus(W\cup S)$ . Let $R^{\prime}(W^{\prime})$ and $P^{\prime}$ be random permutations of sets $R(W^{\prime})$ and $[K]\setminus(W^{\prime}\cup R(W^{\prime}))$ , respectively. Let $\pi$ be a permutation on the set $[K]$ that maps $W$ to $W^{\prime}$ , $S$ to $R^{\prime}(W^{\prime})$ , and $P$ to $P^{\prime}$ . Then, the query function $Q$ maps $(W,S)$ to $A_{\pi}$ in $\mathcal{A}$ . Note that it suffices for the user to send $\pi$ as their query.

Recovery functions: For a set $P\subset[K]$ , let $\mathbf{u}_{a}\mid_{P}$ denote the length- $|P|$ vector obtained by deleting the coordinates of $\mathbf{w}_{a}$ outside $P$ . Now, given $\pi$ and $A_{\pi}$ , define the recovery function as

[TABLE]

where $g_{W^{\prime}}(\cdot)$ is the repair function of $\mathcal{C}$ for the coordinate $\mathbf{c}_{W^{\prime}}$ (see Definition 4).

Recoverability and Privacy: It is straightforward to verify that $D\left(A_{\pi}(X),X_{S}\right)=X_{W}$ (cf. (26)). The $W$ -privacy condition (19) can be proven in the same way as in the proof of Theorem 2, and thus, the proof is omitted.

V Conclusion

The theoretical computer science community has established a strong relationship between PIR schemes and locally decodable codes. This paper extends this theme by establishing strong relationship between PIR schemes for a recently proposed single-server PIR with side information problem and locally recoverable codes. As corollaries to these results, we obtain upper bounds on the download rate for PIR-SI schemes, and a novel rate upper bound on cooperative LRCs.

Acknowledgement

S. Kadhe would like to thank Kannan Ramchandran for helpful discussions.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private information retrieval,” Journal of the ACM , vol. 45, no. 6, pp. 965–981, 1998.
2[2] S. Yekhanin, “Private information retrieval,” Communications of the ACM , vol. 53, no. 4, pp. 68–73, 2010.
3[3] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” Co RR , vol. abs/1602.09134, 2016. [Online]. Available: http://arxiv.org/abs/1602.09134
4[4] ——, “The capacity of robust private information retrieval with colluding databases,” IEEE Trans. on Info. Theory , vol. 64, no. 4, pp. 2361–2370, April 2018.
5[5] R. Tajeddine and S. El Rouayheb, “Robust private information retrieval on coded data,” in 2017 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2017.
6[6] K. Banawan and S. Ulukus, “Multi-message private information retrieval: Capacity results and near-optimal schemes,” Co RR , vol. abs/1702.01739, 2017. [Online]. Available: http://arxiv.org/abs/1702.01739
7[7] ——, “The capacity of private information retrieval from coded databases,” IEEE Trans. on Info. Theory , vol. 64, no. 3, pp. 1945–1956, March 2018.
8[8] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , Oct 2017, pp. 1099–1106.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On an Equivalence Between Single-Server PIR with Side Information and Locally Recoverable Codes

Abstract

I Introduction

II Preliminaries

II-A Single-Server PIR with Side Information

Definition 1**.**

II-B Locally Recoverable Codes

Definition 2**.**

Example 1**.**

Proposition 1**.**

II-C Cooperative Locally Recoverable Codes

Definition 3**.**

III Equivalence Results for Scalar-Linear Schemes

III-A Single-Message PIR-SI Schemes and LRCs

Theorem 1**.**

Proof:

Corollary 1**.**

Proof:

Remark 1**.**

Corollary 2**.**

Theorem 2**.**

Proof:

III-B Multi-Message PIR-SI and Cooperative LRCs

Theorem 3**.**

Proof:

Corollary 3**.**

Proof:

Theorem 4**.**

Proof:

Corollary 4**.**

Proof:

Remark 2**.**

III-C (W,S)(W,S)(W,S)-Private PIR-SI Schemes and MDS Codes

Theorem 5**.**

Proof:

Theorem 6**.**

Proof:

IV Equivalence Results for Non-Linear Schemes

Definition 4**.**

Definition 5**.**

Lemma 1**.**

Theorem 7**.**

Proof:

Proposition 2**.**

Theorem 8**.**

Lemma 2**.**

Proof:

Lemma 3**.**

Proof:

V Conclusion

Acknowledgement

Definition 1.

Definition 2.

Example 1.

Proposition 1.

Definition 3.

Theorem 1.

Corollary 1.

Remark 1.

Corollary 2.

Theorem 2.

Theorem 3.

Corollary 3.

Theorem 4.

Corollary 4.

Remark 2.

III-C $(W,S)$ -Private PIR-SI Schemes and MDS Codes

Theorem 5.

Theorem 6.

Definition 4.

Definition 5.

Lemma 1.

Theorem 7.

Proposition 2.

Theorem 8.

Lemma 2.

Lemma 3.