Capacity of Single-Server Single-Message Private Information Retrieval   with Private Coded Side Information

Anoosheh Heidarzadeh; Fatemeh Kazemi; and Alex Sprintson

arXiv:1901.09248·cs.IT·January 29, 2019

Capacity of Single-Server Single-Message Private Information Retrieval with Private Coded Side Information

Anoosheh Heidarzadeh, Fatemeh Kazemi, and Alex Sprintson

PDF

Open Access

TL;DR

This paper investigates the minimal download cost for private information retrieval when the user has private coded side information, providing lower bounds and proposing protocols that achieve these bounds.

Contribution

It establishes fundamental lower bounds for PIR with private coded side information and introduces optimal protocols matching these bounds.

Findings

01

Lower bounds on download cost for PIR with coded side information.

02

Proposed PIR protocols that achieve these lower bounds.

03

Analysis of models where demand is or isn't part of side information.

Abstract

We study the problem of single-server single-message Private Information Retrieval with Private Coded Side Information (PIR-PCSI). In this problem, there is a server that stores a database, and a user who knows a random linear combination of a random subset of messages in the database. The number of messages contributing to the user's side information is known to the server a priori, whereas their indices and coefficients are unknown to the server a priori. The user wants to retrieve a message from the server (with minimum download cost), while protecting the identities of both the demand and side information messages. Depending on whether the demand is part of the coded side information or not, we consider two different models for the problem. For the model in which the demand does not contribute to the side information, we prove a lower bound on the minimum download cost for all…

Equations47

p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)=\left\{\begin{array}[]{ll}(K-M)^{-1},&W\not\in S,\\ 0,&\text{otherwise}.\end{array}\right.

p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)=\left\{\begin{array}[]{ll}(K-M)^{-1},&W\not\in S,\\ 0,&\text{otherwise}.\end{array}\right.

p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)=\left\{\begin{array}[]{ll}M^{-1},&W\in S,\\ 0,&\text{otherwise};\end{array}\right.

p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)=\left\{\begin{array}[]{ll}M^{-1},&W\in S,\\ 0,&\text{otherwise};\end{array}\right.

P (W = W^{'}, S = S^{'} ∣ Q^{[W, S, C]}, I^{[W, S]} = θ)

P (W = W^{'}, S = S^{'} ∣ Q^{[W, S, C]}, I^{[W, S]} = θ)

= P (W = W^{'}, S = S^{'} ∣ I^{[W, S]} = θ)

H (X_{W} ∣ A^{[W, S, C]}, Q^{[W, S, C]}, I^{[W, S]}, Y^{[S, C]}) = 0.

H (X_{W} ∣ A^{[W, S, C]}, Q^{[W, S, C]}, I^{[W, S]}, Y^{[S, C]}) = 0.

H (X_{W^{*}} ∣ A^{[W, S, C]}, Q^{[W, S, C]}, I^{[W, S]}, Y^{[S^{*}, C^{*}]}) = 0.

H (X_{W^{*}} ∣ A^{[W, S, C]}, Q^{[W, S, C]}, I^{[W, S]}, Y^{[S^{*}, C^{*}]}) = 0.

H (A) \geq H (X_{W}) + H (A ∣ Q, Y, X_{W}) .

H (A) \geq H (X_{W}) + H (A ∣ Q, Y, X_{W}) .

H (A ∣ Q, Y, X_{W})

H (A ∣ Q, Y, X_{W})

= H (A ∣ Q, Y, X_{W}, Y_{I})

+ H (X_{I} ∣ A, Q, Y, X_{W}, Y_{I})

= H (X_{I} ∣ Q, Y, X_{W}, Y_{I})

+ H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I})

= H (X_{I}) + H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I})

H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I})

H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I})

= H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I})

+ H (X_{J} ∣ A, Q, Y, X_{W}, Y_{I}, X_{I})

= H (X_{J} ∣ Q, Y, X_{W}, Y_{I}, X_{I})

+ H (A ∣ Q, Y, X_{W}, Y_{I}, X_{I}, X_{J})

\geq H (X_{J})

H (A)

H (A)

= H (X_{I} ∣ Q) + H (A ∣ Q, X_{I})

= H (X_{I}) + H (A ∣ Q, X_{I}) .

H (A ∣ Q, X_{I})

H (A ∣ Q, X_{I})

= H (A ∣ Q, X_{I}, X_{n + 1})

+ H (Z_{J} ∣ A, Q, X_{I}, X_{n + 1})

= H (Z_{J} ∣ Q, X_{I}, X_{n + 1})

+ H (A ∣ Q, X_{I}, X_{n + 1}, Z_{J})

\geq H (Z_{J})

H (A)

H (A)

= H (A ∣ Q, Y) + H (X_{1} ∣ A, Q, Y)

= H (X_{1} ∣ Q, Y) + H (A ∣ Q, Y, X_{1})

= H (X_{1}) + H (A ∣ Q, Y, X_{1})

+ H (Z_{J} ∣ A, Q, Y, X_{1})

= H (X_{1}) + H (Z_{J} ∣ Q, Y, X_{1})

+ H (A ∣ Q, Y, X_{1}, Z_{J})

\geq H (X_{1}) + H (Z_{J})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Security in Wireless Sensor Networks

Full text

\xpatchcmd

Proof:

Capacity of Single-Server Single-Message Private Information Retrieval with Private Coded Side Information

Anoosheh Heidarzadeh, Fatemeh Kazemi, and Alex Sprintson The authors are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (E-mail: {anoosheh, fatemeh.kazemi, spalex}@tamu.edu).

Abstract

We study the problem of single-server single-message Private Information Retrieval with Private Coded Side Information (PIR-PCSI). In this problem, there is a server that stores a database, and a user who knows a random linear combination of a random subset of messages in the database. The number of messages contributing to the user’s side information is known to the server a priori, whereas their indices and coefficients are unknown to the server a priori. The user wants to retrieve a message from the server (with minimum download cost), while protecting the identities of both the demand and side information messages.

Depending on whether the demand is part of the coded side information or not, we consider two different models for the problem. For the model in which the demand does not contribute to the side information, we prove a lower bound on the minimum download cost for all (linear and non-linear) PIR protocols; and for the other model wherein the demand is one of the messages contributing to the side information, we prove a lower bound for all scalar-linear PIR protocols. In addition, we propose novel PIR protocols that achieve these lower bounds.

I introduction

In the information-theoretic Private Information Retrieval (PIR) problem (see, e.g., [1, 2]), there is a user that wishes to download a single or multiple messages belonging to a database stored on a single or multiple (non-colluding or colluding) servers. The goal of the user is to minimize the download cost (i.e., the amount of information downloaded from the server(s)), while hiding the identity of its demanded message(s) from the server(s). This setup was recently extended in [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] to the settings wherein the user has some side information about the messages in the database, and the side information is unknown to the server(s).

For the single-server setting of the PIR problem in the presence of some side information, we studied the cases in which the side information is a random subset of messages (a.k.a. PIR with Side Information (PIR-SI)) or a random linear combination of a random subset of messages (a.k.a. PIR with Coded Side Information (PIR-CSI)) in [3, 11] and [9], respectively. The multi-server setting of the PIR-SI problem was also studied in [7, 8, 10]. For the PIR-SI problem, two different types of privacy, known as $W$ -privacy (i.e., only the identities of the demand messages must be protected) and $(W,S)$ -privacy (i.e., the identities of both the demand and side information messages must be protected jointly) have been considered, whereas the problem of PIR-CSI has only been studied when $W$ -privacy is required.

In this work, we study the single-server single-message PIR-CSI problem where $(W,S)$ -privacy is required. In this problem, referred to as PIR with Private Coded Side Information (PIR-PCSI), there is a single server storing a database of $K$ messages, and there is a user who knows a random linear combination of a random subset of $M$ messages. This setting can be motivated by several practical scenarios. The user may have obtained their side information via overhearing in a wireless network; or from a trusted server with limited knowledge about the database; or from the information locally stored in the user’s cache of limited size, to name a few. The user is interested in downloading a single message from the server while preserving the privacy of both the demand message and the messages contributing to the side information. Depending on whether the user’s demanded message itself contributes to the user’s side information or not, we consider two different models of the PIR-PCSI problem.

I-A Main Contributions

For the model in which the demanded message is not part of the coded side information, we characterize the capacity and the scalar-linear capacity of the PIR-PCSI problem, where the (scalar-linear) capacity is defined as the supremum of all achievable rates (i.e., the inverse of the download cost) for all (scalar-linear) protocols. In particular, we show that for this model the capacity and the scalar-linear capacity are both equal to $(K-M)^{-1}$ for any ${0\leq M\leq K-1}$ . This is interesting because, as shown in [3, Theorem 2], even when the user knows $M$ (uncoded) messages as their side information, in order to guarantee $(W,S)$ -privacy, the minimum download cost is $K-M$ . This shows that for achieving $(W,S)$ -privacy there will be no loss in capacity even if only one linear combination of $M$ messages (instead of $M$ messages separately) is known to the user a priori.

For the model wherein the user’s demanded message contributes to their coded side information, we show that the scalar-linear capacity of the PIR-PCSI problem is equal to $(K-M+1)^{-1}$ for any $2\leq M\leq K$ . Interestingly, this result shows that when the user knows $M-1$ messages (different from the demand), achieving $(W,S)$ -privacy is as costly as that when the user knows only one linear combination of the $M-1$ messages and the demand.

The converse proofs are based on information-theoretic arguments, and the proofs of achievability rely on novel PIR protocols based on the Generalized Reed-Solomon (GRS) codes that include a specific codeword.

II Problem Formulation

Let $\mathbb{F}_{q}$ be a finite field of size $q$ , and let $\mathbb{F}_{q^{m}}$ be an extension field of $\mathbb{F}_{q}$ for some integer $m$ . Let $L\triangleq m\log_{2}q$ , and let $\mathbb{F}_{q}^{\times}\triangleq\mathbb{F}_{q}\setminus\{0\}$ . For a positive integer $i$ , we denote $\{1,\dots,i\}$ by $[i]$ . Let $K\geq 1$ and $0\leq M\leq K$ be two integers. We denote the set of all subsets of $\mathcal{K}\triangleq[K]$ of size $M$ by $\mathcal{S}$ , and the set of all sequences of length $M$ with elements from $\mathbb{F}^{\times}_{q}$ by $\mathcal{C}$ .

Assume that there is a server that stores a set of $K$ messages $X_{1},\dots,X_{K}$ , with each message $X_{i}$ being independently and uniformly distributed over $\mathbb{F}_{q^{m}}$ , i.e., ${H(X_{1})=\dots=H(X_{K})=L}$ and $H(X_{1},\dots,X_{K})=KL$ . Also assume that there is a user that wishes to retrieve a message $X_{W}$ from the server for some $W\in\mathcal{K}$ , and knows a linear combination ${Y^{[S,C]}\triangleq\sum_{i\in S}c_{i}X_{i}}$ for some $S\triangleq\{i_{1},\dots,i_{M}\}\in\mathcal{S}$ and ${C\triangleq\{c_{i_{1}},\dots,c_{i_{M}}\}\in\mathcal{C}}$ . We refer to $W$ as the demand index, $X_{W}$ as the demand, $S$ as the side information index set, $Y^{[S,C]}$ as the side information, and $M$ as the side information size.

We denote by $\boldsymbol{S}$ , $\boldsymbol{C}$ , and $\boldsymbol{W}$ the random variables representing $S$ , $C$ , and $W$ , respectively. We also denote the probability mass function (PMF) of $\boldsymbol{S}$ by $p_{\boldsymbol{S}}(\cdot)$ , the PMF of $\boldsymbol{C}$ by $p_{\boldsymbol{C}}(\cdot)$ , and the conditional PMF of $\boldsymbol{W}$ given $\boldsymbol{S}$ by $p_{\boldsymbol{W}|\boldsymbol{S}}(\cdot|\cdot)$ . We assume that $\boldsymbol{S}$ is uniformly distributed over $\mathcal{S}$ , i.e., $p_{\boldsymbol{S}}(S)=\binom{K}{M}^{-1}$ for all $S\in\mathcal{S}$ ; and $\boldsymbol{C}$ is uniformly distributed over $\mathcal{C}$ , i.e., $p_{\boldsymbol{C}}(C)=(q-1)^{-M}$ for all $C\in\mathcal{C}$ . Also, we consider two different models for the conditional PMF of $\boldsymbol{W}$ given $\boldsymbol{S}=S$ as follows:

Model I

$\boldsymbol{W}$ is uniformly distributed over $\mathcal{K}\setminus S$ , i.e.,

[TABLE]

Model II

$\boldsymbol{W}$ is uniformly distributed over $S$ , i.e.,

[TABLE]

To avoid the degenerate cases, we assume ${0\leq M\leq K-1}$ and $2\leq M\leq K$ for the models I and II, respectively.

Let $I^{[W,S]}$ be an indicator function such that ${I^{[W,S]}=1}$ if $W\in S$ , and ${I^{[W,S]}=0}$ if $W\not\in S$ . Note that $\mathbb{P}(\boldsymbol{W}=W^{\prime},\boldsymbol{S}=S^{\prime}|I^{[W,S]}=0)$ is equal to ${(K-M)^{-1}\binom{K}{M}^{-1}}$ if $W^{\prime}\not\in S^{\prime}$ , and it is zero otherwise; and $\mathbb{P}(\boldsymbol{W}=W^{\prime},\boldsymbol{S}=S^{\prime}|I^{[W,S]}=1)$ is equal to ${M^{-1}\binom{K}{M}^{-1}}$ if $W^{\prime}\in S^{\prime}$ , and it is zero otherwise.

We assume that $I^{[W,S]}$ is known to the server a priori. We also assume that the server knows the size of $S$ (i.e., $M$ ) and the PMF’s $p_{\boldsymbol{S}}(\cdot)$ , $p_{\boldsymbol{C}}(\cdot)$ , and $p_{\boldsymbol{W}|\boldsymbol{S}}(\cdot|\cdot)$ , whereas the realizations $S$ , $C$ , and $W$ are unknown to the server a priori.

For any $S$ , $C$ , and $W$ , in order to retrieve $X_{W}$ , the user sends to the server a query $Q^{[W,S,C]}$ , which is a (potentially stochastic) function of $W$ , $S$ , $C$ , and $Y^{[S,C]}$ . The query $Q^{[W,S,C]}$ must protect the privacy of both the user’s demand index $W$ and side information index set $S$ from the server’s perspective, i.e., for any given $\theta\in\{0,1\}$ ,

[TABLE]

for all $W^{\prime}\in\mathcal{K}$ and all $S^{\prime}\in\mathcal{S}$ . We refer to this condition as the $(W,S)$ -privacy condition. Note that the $(W,S)$ -privacy condition is stronger than the $W$ -privacy condition being previously studied in [9], where the query must protect only the privacy of the user’s demand index, i.e., for any given $\theta\in\{0,1\}$ , we have $\mathbb{P}(\boldsymbol{W}=W^{\prime}|Q^{[W,S,C]},I^{[W,S]}=\theta)=\mathbb{P}(\boldsymbol{W}=W^{\prime}|I^{[W,S]}=\theta)$ for all $W^{\prime}\in\mathcal{K}$ and all $S^{\prime}\in\mathcal{S}$ .

Upon receiving $Q^{[W,S,C]}$ , the server sends to the user an answer $A^{[W,S,C]}$ , which is a (deterministic) function of the query $Q^{[W,S,C]}$ , the indicator $I^{[W,S]}$ , and the messages in $X$ , i.e., $H(A^{[W,S,C]}|Q^{[W,S,C]},I^{[W,S]},\{X_{i}\}_{i\in\mathcal{K}})=0$ . The answer $A^{[W,S,C]}$ along with the query $Q^{[W,S,C]}$ , the indicator $I^{[W,S]}$ , and the side information $Y^{[S,C]}$ must enable the user to retrieve the demand $X_{W}$ ,

[TABLE]

This condition is referred to as the recoverability condition.

For each model (I or II), the problem is to design a query $Q^{[W,S,C]}$ and an answer $A^{[W,S,C]}$ for any $W$ , $S$ , and $C$ that satisfy the privacy and recoverability conditions. We refer to this problem as single-server single-message Private Information Retrieval (PIR) with Private Coded Side Information (PCSI), or PIR-PCSI for short. Specifically, we refer to the PIR-PCSI problem under the model I as PIR-PCSI–I, and under the model II as PIR-PCSI–II.

We refer to a collection of $Q^{[W,S,C]}$ and $A^{[W,S,C]}$ (for all $W$ , $S$ , and $C$ such that $I^{[W,S]}=0$ or $I^{[W,S]}=1$ ) which satisfy the privacy and recoverability conditions as a PIR-PCSI–I protocol or a PIR-PCSI–II protocol, respectively.

The rate of a PIR-PCSI (–I or –II) protocol is defined as the ratio of the entropy of a message, i.e., $L$ , to the average entropy of the answer, i.e., $H(A^{[\boldsymbol{W},\boldsymbol{S},\boldsymbol{C}]})=\sum H(A^{[W,S,C]})p_{\boldsymbol{W}|\boldsymbol{S}}(W|S)p_{\boldsymbol{S}}(S)p_{\boldsymbol{C}}(C)$ , where the summation is over all $W$ , $S$ , and $C$ (such that $I^{[W,S]}=0$ or $I^{[W,S]}=1$ ). The capacity of PIR-PCSI (–I or –II) problem is defined as the supremum of rates over all PIR-PCSI (–I or –II) protocols. The supremum of rates over all scalar-linear PIR-PCSI (–I or –II) protocols, i.e., the answer contains only scalar-linear combinations of the messages, is defined as the scalar-linear capacity of PIR-PCSI (–I or –II) problem.

In this work, our goal is to characterize the capacity and the scalar-linear capacity of the PIR-PCSI–I and PIR-PCSI–II problems, and to design PIR-PCSI (–I and –II) protocols that are capacity-achieving.

III Main Results

We present our main results in this section. The capacity and the scalar-linear capacity of PIR-CSI–I problem are characterized in Theorem 1, and the scalar-linear capacity of PIR-CSI–II problem is characterized in Theorem 2. The proofs are given in Sections IV and V.

Theorem 1.

The capacity and the scalar-linear capacity of PIR-PCSI–I problem with $K$ messages and side information size $0\leq M\leq K-1$ are given by $(K-M)^{-1}$ .

The converse follows directly from the result of [3, Theorem 2], which was proven using an index coding argument, for single-server single-message PIR with (uncoded) side information when $(W,S)$ -privacy is required. In this work, we provide an alternative proof by upper bounding the rate of any PIR-PCSI–I protocol using information-theoretic arguments (see Section IV-A). The key component of the proof is a necessary condition implied by the $(W,S)$ -privacy and recoverability conditions (see Lemma 1). The achievability proof relies on a new PIR-PCSI–I protocol, termed the Specialized GRS Code protocol, based on the Generalized Reed-Solomon (GRS) codes with a specific codeword, which achieves the rate $(K-M)^{-1}$ (see Section IV-B).

Remark 1.

It was shown in [3] that when there is a single server storing $K$ messages, and there is a user that knows $M$ (uncoded) messages as their side information and demands a single message not in their side information, in order to guarantee the $(W,S)$ -privacy condition, the minimum download cost is $K-M$ . Surprisingly, this result matches the result of Theorem 1. This shows that for achieving $(W,S)$ -privacy there will be no loss in capacity even if only one linear combination of $M$ messages (instead of $M$ messages separately) is known to the user a priori.**

Remark 2.

When $W$ -privacy, which is a weaker notion of privacy in comparison to $(W,S)$ -privacy, is required (i.e., only the user’s demand index, and not the user’s side information index set, must be protected from the server), the result of [9, Theorem 1] shows that the capacity of single-server single-message PIR with a coded side information that does not include the demand (known as the PIR-CSI–I problem in [9]) is equal to $\lceil\frac{K}{M+1}\rceil^{-1}$ . Since $\lceil\frac{K}{M+1}\rceil<K-M$ for all ${1\leq M\leq K-2}$ , the capacity of PIR-PCSI–I is strictly smaller than that of PIR-CSI–I, as expected. However, for the two extremal cases of $M=0$ and $M=K-1$ , it follows that $(W,S)$ -privacy comes at no extra cost than $W$ -privacy.**

Theorem 2.

The scalar-linear capacity of PIR-PCSI–II problem with $K$ messages and side information size ${2\leq M\leq K}$ is given by $(K-M+1)^{-1}$ .

The converse proof is based on a mixture of algebraic and information-theoretic arguments (see Section V-A), and the proof of achievability is based on a modified version of the Specialized GRS Code protocol which achieves the rate $(K-M+1)^{-1}$ (see Section V-B).

Remark 3.

Interestingly, comparing the results of [3, Theorem 2] and Theorem 2, one can see that when the user knows $M-1$ messages (different from the demand) separately, achieving $(W,S)$ -privacy is as costly as that when the user’s side information is only one linear combination of $M$ messages including the demand.**

Remark 4.

As shown in [9, Theorem 2], when $W$ -privacy is required, the capacity of single-server single-message PIR with a coded side information to which the demand message contributes (known as the PIR-CSI–II problem in [9]) is equal to $1$ for $M=2$ and $M=K$ , and is equal to $\frac{1}{2}$ for all ${3\leq M\leq K-1}$ . The result of Theorem 2 matches this result for the cases of $M=K$ and $M=K-1$ , and thereby, $(W,S)$ -privacy and $W$ -privacy are attainable at the same cost. For other cases of $M$ , as expected, achieving $(W,S)$ -privacy is more costly than achieving $W$ -privacy.**

IV The PIR-PCSI–I Problem

IV-A Converse for Theorem 1

Obviously, the capacity of PIR-PCSI–I is upper bounded by the capacity of PIR with uncoded side information where $(W,S)$ -privacy is required, which was shown to be ${(K-M)^{-1}}$ in [3] using an index-coding argument, where $M$ uncoded messages are available at the user as side information. This proves the converse for Theorem 1. We present an alternative information-theoretic proof here.

The following result gives a necessary condition for $(W,S)$ -privacy and recoverability.

Lemma 1.

For any $\theta\in\{0,1\}$ , $W\in\mathcal{K}$ , and $S\in\mathcal{S}$ where $I^{[W,S]}=\theta$ , and $C\in\mathcal{C}$ , and any ${W^{*}\in\mathcal{K}}$ and $S^{*}\in\mathcal{S}$ where $I^{[W^{*},S^{*}]}=\theta$ , there must exist ${C^{*}\in\mathcal{C}}$ such that

[TABLE]

Proof:

The proof is straightforward by the way of contradiction, and hence omitted. ∎

Lemma 2.

For any $0\leq M\leq K-1$ , the capacity of PIR-PCSI–I is upper bounded by ${(K-M)^{-1}}$ .

Proof:

Fix $W$ , $S$ , and $C$ (and accordingly, $Y\triangleq Y^{[S,C]}$ ) such that $I^{[W,S]}=0$ , and let $Q\triangleq Q^{[W,S,C]}$ and $A\triangleq A^{[W,S,C]}$ be the user’s query and the server’s answer, respectively, for an arbitrary PIR-PCSI-I protocol. We need to show that $H(A^{[\boldsymbol{W},\boldsymbol{S},\boldsymbol{C}]})=H(A)\geq(K-M)L$ . Similar to the proof of [9, Theorem 1], it can be shown that

[TABLE]

If $W\cup S=\mathcal{K}$ (i.e., $M=K-1$ ), then we have $H(A)\geq H(X_{W})=L$ , as was to be shown. If $W\cup S\neq\mathcal{K}$ , for any ${j\in\mathcal{K}\setminus(W\cup S)}$ there exists $C_{j}\in\mathcal{C}$ (and accordingly, $Y_{j}\triangleq Y^{[S,C_{j}]}$ ) such that $H(X_{j}|A,Q,Y_{j})=0$ (by Lemma 1). Let $I$ be a maximal subset of ${\mathcal{K}\setminus(W\cup S)}$ such that $Y$ and $Y_{I}\triangleq\{Y_{j}\}_{j\in I}$ are linearly independent. (Note that ${|I|\leq|S|-1=M-1}$ .) Let $X_{I}\triangleq\{X_{j}\}_{j\in I}$ . Then, we have

[TABLE]

where (2) holds because $H(X_{j}|A,Q,Y_{j})=0$ for all $j\in I$ (by assumption); and (3) holds since $X_{I}$ is independent of $(Q,Y,X_{W},Y_{I})$ (noting that $I$ and $W\cup S$ are disjoint). Note also that, by the maximality of $I$ , for any $j\in J\triangleq{\mathcal{K}\setminus(W\cup S\cup I)}$ , there exists $C_{j}\in C$ (and accordingly, $Y_{j}\triangleq Y^{[S,C_{j}]}$ , which is linearly dependent on $\{Y,Y_{I}\}$ ) such that $H(X_{j}|A,Q,Y_{j})=0$ , and subsequently, $H(X_{j}|A,Q,Y_{I})=0$ . (Note that $|J|={K-M-1-|I|}$ .) Thus, we can write

[TABLE]

where (4) holds since $H(X_{j}|A,Q,Y_{I})=0$ for all $j\in J$ (by assumption); and (5) holds because $X_{J}$ and $(Q,Y,X_{W},Y_{I},X_{I})$ are independent (noting that $J$ and ${W\cup S\cup I}$ are disjoint). Putting (1), (2), (3), and (5) together, it follows that $H(A)\geq H(X_{W})+H(X_{I})+H(X_{J})=(K-M)L$ , as was to be shown. ∎

IV-B Achievability for Theorem 1

In this section, we propose a PIR-PCSI–I protocol for arbitrary $K$ and $M$ that achieves the rate $(K-M)^{-1}$ . Throughout, we assume that $q$ is sufficiently large, particularly $q\geq K$ . For arbitrary $q<K$ , the achievability of the rate $(K-M)^{-1}$ , which is not necessarily feasible, is conditional on the existence of a $(K,K-M)$ maximum-distance-seperable (MDS) code over $\mathbb{F}_{q}$ that includes a codeword with support $S\cup W$ such that the $i$ th codeword symbol is $c_{i}$ for $i\in S$ , and is non-zero for $i=W$ .

Assume that $q\geq K$ , and let $\omega_{1},\dots,\omega_{K}$ be $K$ distinct elements from $\mathbb{F}_{q}$ .

Specialized GRS Code Protocol: This protocol consists of four steps as follows:

Step 1: The user first constructs a polynomial ${p(x)=\sum_{i=0}^{K-M-1}p_{i}x^{i}\triangleq\prod_{i\not\in S\cup W}(x-\omega_{i})}$ , and then constructs $K-M$ sequences $Q_{1},\dots,Q_{K-M}$ , each of length $K$ , such that $Q_{i}=\{v_{1}\omega_{1}^{i-1},\dots,v_{K}\omega_{K}^{i-1}\}$ for $i\in[K-M]$ , where $v_{i}=\frac{c_{i}}{p(\omega_{i})}$ for $i\in S$ , and $v_{i}$ is a randomly chosen element from $\mathbb{F}_{q}^{\times}$ for $i\not\in S$ .

For any ${i\in[K-M]}$ , the $j$ th element, for any ${j\in\mathcal{K}}$ , in the sequence $Q_{i}$ can be thought of as the entry $(i,j)$ of a $(K-M)\times K$ matrix $G\triangleq{[g_{1}^{\mathsf{T}},\dots,g_{K-M}^{\mathsf{T}}]}^{\mathsf{T}}$ , which is the generator matrix of a $(K,K-M)$ GRS code with distinct parameters ${\omega_{1},\dots,\omega_{K}}$ and non-zero multipliers $v_{1},\dots,v_{K}$ [13]. The construction above ensures that such a GRS code has a specific codeword with support $S\cup W$ , namely $\sum_{i=1}^{K-M}p_{K-M-i}g_{i}$ , where the $i$ th codeword symbol is $c_{i}$ for $i\in S$ , and is non-zero for $i=W$ .

Step 2: The user reorders $Q_{1},\dots,Q_{K-M}$ by a randomly chosen permutation ${\sigma:[K-M]\rightarrow[K-M]}$ , and sends the query $Q^{[W,S,C]}=\{Q_{\sigma^{-1}(1)},\dots,Q_{\sigma^{-1}(K-M)}\}$ to the server.

Step 3: By using $Q_{i}$ , the server computes $A_{i}=\sum_{j=1}^{K}v_{j}\omega_{j}^{i-1}X_{j}$ for all $i\in[K-M]$ where $Q_{i}=\{v_{1}\omega_{1}^{i-1},\dots,v_{K}\omega_{K}^{i-1}\}$ , and it sends the answer $A^{[W,S,C]}=\{A_{\sigma^{-1}(1)},\dots,A_{\sigma^{-1}(K-M)}\}$ to the user.

Note that $A_{i}$ ’s are the parity check equations of a $(K,M)$ GRS code which is the dual code of the GRS code generated by the matrix $G$ defined earlier.

Step 4: Upon receiving the answer, the user retrieves $X_{W}$ by subtracting off the contribution of the side information $Y^{[S,C]}$ from $\sum_{i=1}^{K-M}p_{K-M-i}A_{\sigma(i)}=c_{W}X_{W}+\sum_{i\in S}c_{i}X_{i}$ .

Lemma 3.

The Specialized GRS Code protocol is a PIR-PCSI–I protocol, and achieves the rate $(K-M)^{-1}$ .

Proof:

Since the matrix $G$ , defined in Step 1 of the protocol, generates a $(K,K-M)$ GRS code which is an MDS code, then the rows of $G$ are linearly independent, and accordingly, $A_{1},\dots,A_{K-M}$ are linearly independent combinations of $X_{1},\dots,X_{K}$ , which are themselves independently and uniformly distributed over $\mathbb{F}_{q^{m}}$ . Thus, $A_{1},\dots,A_{K-M}$ are independently and uniformly distributed over $\mathbb{F}_{q^{m}}$ . Since $H(X_{1})=\dots=H(X_{K})=L$ , then $H(A_{1})=\dots=H(A_{K-M})=L$ , and $H(A^{[W,S,C]})=H(A_{1},\dots,A_{K-M})=\sum_{i=1}^{K-M}H(A_{i})=(K-M)L$ for any $S\in\mathcal{S}$ , any $W\not\in S$ , and any $C\in\mathcal{C}$ . Since the joint distribution of $\boldsymbol{W}$ and $\boldsymbol{S}$ is uniform and $\boldsymbol{C}$ is uniformly distributed, then $H(A^{[\boldsymbol{W},\boldsymbol{S},\boldsymbol{C}]})=H(A^{[W,S,C]})$ . Thus, the Specialized GRS Code protocol has the rate $L/H(A^{[\boldsymbol{W},\boldsymbol{S},\boldsymbol{C}]})=L/H(A^{[W,S,C]})=(K-M)^{-1}$ .

Next, we prove that the Specialized GRS Code protocol is a PIR-PCSI–I protocol. It should be obvious from the construction that the recoverability condition is satisfied. The $(W,S)$ -privacy condition is also satisfied because the $(K,K-M)$ GRS code, generated by the matrix $G$ , is an MDS code, and thereby, the minimum (Hamming) weight of a codeword is $K-(K-M)+1=M+1$ , and there are the same number of minimum-weight codewords for any support of size ${M+1}$ [13]. Thus, for any $S\in\mathcal{S}$ and any $W\not\in S$ , the dual code, whose parity check matrix is given by $G$ , contains the same number of parity check equations (with support $S\cup W$ ) from each of which, given $Y^{[S,C]}$ for some $C\in\mathcal{C}$ , $X_{W}$ can be recovered. ∎

V The PIR-PCSI–II Problem

V-A Converse for Theorem 2

In this section, we give an information-theoretic proof of converse for Theorem 2.

Lemma 4.

For any $2\leq M\leq K$ , the scalar-linear capacity of PIR-PCSI–II is upper bounded by ${(K-M+1)^{-1}}$ .

Proof:

Fix $W$ , $S$ , and $C$ (and $Y\triangleq Y^{[S,C]}$ ) such that $I^{[W,S]}=1$ . Let $Q\triangleq Q^{[W,S,C]}$ and $A\triangleq A^{[W,S,C]}$ be the query and the answer of an arbitrary scalar-linear PIR-PCSI–II protocol. We need to show that $H(A)\geq{(K-M+1)L}$ . Let $I$ be the set of all $j\in\mathcal{K}$ such that $H(X_{j}|A,Q)=0$ , i.e., $X_{j}$ is recoverable from $A$ (and $Q$ ) directly. Let $X_{I}\triangleq\{X_{j}\}_{j\in I}$ . There are two cases: (i) $I\neq\emptyset$ , and (ii) $I=\emptyset$ .

Case (i): Since $X_{I}$ and $Q$ are independent and $H(X_{I}|A,Q)=0$ (by assumption), then

[TABLE]

If ${|I|\geq K-M+1}$ , then $H(X_{I})\geq(K-M+1)L$ , and subsequently, $H(A)\geq(K-M+1)L$ , as was to be shown. If $|I|\leq K-M$ , $H(A|Q,X_{I})$ can be further lower bounded as follows. Let $n\triangleq|I|$ . Assume, w.l.o.g., that $I=[n]$ . Let $J\triangleq[K-M-n+1]$ , and $S_{j}\triangleq{\{n+1,n+j+1,\dots,n+j+M-1\}}$ for $j\in J$ . (Note that $|J|=K-M-n+1$ .) By Lemma 1, for any $j\in J$ , there exists $C_{j}\in\mathcal{C}$ (and accordingly, $Y_{j}\triangleq Y^{[S_{j},C_{j}]}$ ) such that $H(X_{n+1}|A,Q,Y_{j})=0$ . Let $Z_{j}\triangleq Y_{j}-c_{j}X_{n+1}$ where $c_{j}$ is the coefficient of $X_{n+1}$ in $Y_{j}$ . By the scalar-linearity of $A$ , it is easy to see that either $H(Z_{j}|A,Q)=0$ or ${H(Z_{j}+c^{*}_{j}X_{n+1}|A,Q)=0}$ for some $c^{*}_{j}\in\mathbb{F}^{\times}_{q}\setminus\{c_{j}\}$ . (Otherwise, the server learns that the user’s demand index and side information index set cannot be $n+1$ and $S_{j}$ , respectively. This obviously violates the $(W,S)$ -privacy condition.) Thus, $H(Z_{j}|A,Q,X_{n+1})=0$ . Let $Z_{J}\triangleq\{Z_{j}\}_{j\in J}$ . Then, we have

[TABLE]

where (7) holds since $H(Z_{j}|A,Q,X_{n+1})=0$ for all $j\in J$ (by assumption); and (8) follows because $Z_{J}$ is independent of $(Q,X_{I},X_{n+1})$ , noting that $Z_{J}$ , $X_{I}$ , and $X_{n+1}$ are linearly independent (by construction). By the linear independence of $Z_{j}$ ’s for all $j\in J$ , it follows that $H(Z_{J})={(K-M-n+1)L}$ . By (6) and (8), we get $H(A)\geq{nL}+{(K-M-n+1)L}={(K-M+1)L}$ .

Case (ii): Assume, w.l.o.g., that ${W=1}$ and $S=[M]$ . Let $J\triangleq[K-M]$ , and $S_{j}\triangleq{\{1,j+2,\dots,j+M-2\}}$ for $j\in J$ . (Note that $|J|=K-M$ .) Similarly as in the case (i), define $Y_{j}$ (and accordingly $Z_{j}$ ) for all $j\in J$ , where $X_{n+1}$ is replaced by $X_{1}$ . By using a similar argument as before, it can be shown that $H(Z_{j}|A,Q,X_{1})=0$ for all $j\in J$ . Let $Z_{J}\triangleq\{Z_{j}\}_{j\in J}$ . Then, we can write

[TABLE]

where (9) follows since $H(X_{1}|A,Q,Y)=0$ (by the recoverability condition); (10) holds because ${H(Z_{j}|A,Q,X_{1})=0}$ , and subsequently, $H(Z_{j}|A,Q,Y,X_{1})=0$ , for all $j\in J$ ; and (11) follows because $Z_{J}$ is independent of $(Q,Y,X_{1})$ (due to the linear independence of $Z_{J}$ , $Y$ , and $X_{1}$ ). Since $|J|=K-M$ , we have $H(Z_{J})=(K-M)L$ (noting that $Z_{j}$ ’s are linearly independent), and thereby, $H(A)\geq L+(K-M)L=(K-M+1)L$ . ∎

V-B Achievability for Theorem 2

In this section, we propose a PIR-PCSI–II protocol, which is a slightly modified version of the Specialized GRS Code protocol, that achieves the rate $(K-M+1)^{-1}$ for arbitrary $K$ and $M$ .

Modified Specialized GRS Code Protocol: This protocol consists of four steps, where the steps 2-4 are the same as those in the Specialized GRS Code protocol (Section IV-B), except that $M$ is replaced with $M-1$ everywhere. The step 1 of the proposed protocol is as follows:

Step 1: The user first constructs a polynomial ${p(x)=\sum_{i=0}^{K-M}p_{i}x^{i}\triangleq\prod_{i\not\in S}(x-\omega_{i})}$ , and then constructs $K-M+1$ sequences $Q_{1},\dots,Q_{K-M+1}$ , each of length $K$ , such that $Q_{i}=\{v_{1}\omega_{1}^{i-1},\dots,v_{K}\omega_{K}^{i-1}\}$ for $i\in[K-M]$ , where $v_{i}=\frac{c_{i}}{p(\omega_{i})}$ for $i\in S\setminus W$ ; $v_{W}=\frac{c}{p(\omega_{W})}$ where $c$ is chosen uniformly at random from $\mathbb{F}^{\times}_{q}\setminus\{c_{W}\}$ ; and $v_{i}$ is a randomly chosen element from $\mathbb{F}_{q}^{\times}$ for $i\not\in S$ .

Lemma 5.

The Modified Specialized GRS Code protocol is a PIR-PCSI–II protocol, and achieves the rate $(K-M+1)^{-1}$ .

Proof:

The proof, omitted to avoid repetition, follows from the same lines as in the proof of Lemma 3 where $M$ is replaced by $M-1$ , and $W\not\in S$ is replaced by $W\in S$ . ∎

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” IEEE Trans. on Info. Theory , vol. 63, no. 7, pp. 4075–4088, July 2017.
2[2] ——, “The capacity of robust private information retrieval with colluding databases,” IEEE Trans. on Info. Theory , vol. 64, no. 4, pp. 2361–2370, April 2018.
3[3] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 2017 55th Annual Allerton Conf. on Commun., Control, and Computing , Oct 2017, pp. 1099–1106.
4[4] R. Tandon, “The capacity of cache aided private information retrieval,” in 55th Annual Allerton Conf. on Commun., Control, and Computing , Oct 2017, pp. 1078–1082.
5[5] Y. Wei, K. Banawan, and S. Ulukus, “Cache-aided private information retrieval with partially known uncoded prefetching: Fundamental limits,” IEEE Journal on Selected Areas in Communications , vol. 36, no. 6, pp. 1126–1139, June 2018.
6[6] ——, “Fundamental limits of cache-aided private information retrieval with unknown and uncoded prefetching,” IEEE Trans. on Info. Theory , pp. 1–1, 2018.
7[7] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information,” Co RR , vol. abs/1709.00112, 2017. [Online]. Available: http://arxiv.org/abs/1709.00112
8[8] Z. Chen, Z. Wang, and S. Jafar, “The capacity of private information retrieval with private side information,” Co RR , vol. abs/1709.03022, 2017. [Online]. Available: http://arxiv.org/abs/1709.03022

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Capacity of Single-Server Single-Message Private Information Retrieval with Private Coded Side Information

Abstract

I introduction

I-A Main Contributions

II Problem Formulation

Model I

Model II

III Main Results

Theorem 1**.**

Remark 1**.**

Remark 2**.**

Theorem 2**.**

Remark 3**.**

Remark 4**.**

IV The PIR-PCSI–I Problem

IV-A Converse for Theorem 1

Lemma 1**.**

Lemma 2**.**

IV-B Achievability for Theorem 1

Lemma 3**.**

V The PIR-PCSI–II Problem

V-A Converse for Theorem 2

Lemma 4**.**

V-B Achievability for Theorem 2

Lemma 5**.**

Theorem 1.

Remark 1.

Remark 2.

Theorem 2.

Remark 3.

Remark 4.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.