Capacity of Single-Server Single-Message Private Information Retrieval with Private Coded Side Information
Anoosheh Heidarzadeh, Fatemeh Kazemi, and Alex Sprintson

TL;DR
This paper investigates the minimal download cost for private information retrieval when the user has private coded side information, providing lower bounds and proposing protocols that achieve these bounds.
Contribution
It establishes fundamental lower bounds for PIR with private coded side information and introduces optimal protocols matching these bounds.
Findings
Lower bounds on download cost for PIR with coded side information.
Proposed PIR protocols that achieve these lower bounds.
Analysis of models where demand is or isn't part of side information.
Abstract
We study the problem of single-server single-message Private Information Retrieval with Private Coded Side Information (PIR-PCSI). In this problem, there is a server that stores a database, and a user who knows a random linear combination of a random subset of messages in the database. The number of messages contributing to the user's side information is known to the server a priori, whereas their indices and coefficients are unknown to the server a priori. The user wants to retrieve a message from the server (with minimum download cost), while protecting the identities of both the demand and side information messages. Depending on whether the demand is part of the coded side information or not, we consider two different models for the problem. For the model in which the demand does not contribute to the side information, we prove a lower bound on the minimum download cost for all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Security in Wireless Sensor Networks
\xpatchcmd
- Proof:
Capacity of Single-Server Single-Message Private Information Retrieval with Private Coded Side Information
Anoosheh Heidarzadeh, Fatemeh Kazemi, and Alex Sprintson The authors are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (E-mail: {anoosheh, fatemeh.kazemi, spalex}@tamu.edu).
Abstract
We study the problem of single-server single-message Private Information Retrieval with Private Coded Side Information (PIR-PCSI). In this problem, there is a server that stores a database, and a user who knows a random linear combination of a random subset of messages in the database. The number of messages contributing to the user’s side information is known to the server a priori, whereas their indices and coefficients are unknown to the server a priori. The user wants to retrieve a message from the server (with minimum download cost), while protecting the identities of both the demand and side information messages.
Depending on whether the demand is part of the coded side information or not, we consider two different models for the problem. For the model in which the demand does not contribute to the side information, we prove a lower bound on the minimum download cost for all (linear and non-linear) PIR protocols; and for the other model wherein the demand is one of the messages contributing to the side information, we prove a lower bound for all scalar-linear PIR protocols. In addition, we propose novel PIR protocols that achieve these lower bounds.
I introduction
In the information-theoretic Private Information Retrieval (PIR) problem (see, e.g., [1, 2]), there is a user that wishes to download a single or multiple messages belonging to a database stored on a single or multiple (non-colluding or colluding) servers. The goal of the user is to minimize the download cost (i.e., the amount of information downloaded from the server(s)), while hiding the identity of its demanded message(s) from the server(s). This setup was recently extended in [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] to the settings wherein the user has some side information about the messages in the database, and the side information is unknown to the server(s).
For the single-server setting of the PIR problem in the presence of some side information, we studied the cases in which the side information is a random subset of messages (a.k.a. PIR with Side Information (PIR-SI)) or a random linear combination of a random subset of messages (a.k.a. PIR with Coded Side Information (PIR-CSI)) in [3, 11] and [9], respectively. The multi-server setting of the PIR-SI problem was also studied in [7, 8, 10]. For the PIR-SI problem, two different types of privacy, known as -privacy (i.e., only the identities of the demand messages must be protected) and -privacy (i.e., the identities of both the demand and side information messages must be protected jointly) have been considered, whereas the problem of PIR-CSI has only been studied when -privacy is required.
In this work, we study the single-server single-message PIR-CSI problem where -privacy is required. In this problem, referred to as PIR with Private Coded Side Information (PIR-PCSI), there is a single server storing a database of messages, and there is a user who knows a random linear combination of a random subset of messages. This setting can be motivated by several practical scenarios. The user may have obtained their side information via overhearing in a wireless network; or from a trusted server with limited knowledge about the database; or from the information locally stored in the user’s cache of limited size, to name a few. The user is interested in downloading a single message from the server while preserving the privacy of both the demand message and the messages contributing to the side information. Depending on whether the user’s demanded message itself contributes to the user’s side information or not, we consider two different models of the PIR-PCSI problem.
I-A Main Contributions
For the model in which the demanded message is not part of the coded side information, we characterize the capacity and the scalar-linear capacity of the PIR-PCSI problem, where the (scalar-linear) capacity is defined as the supremum of all achievable rates (i.e., the inverse of the download cost) for all (scalar-linear) protocols. In particular, we show that for this model the capacity and the scalar-linear capacity are both equal to for any . This is interesting because, as shown in [3, Theorem 2], even when the user knows (uncoded) messages as their side information, in order to guarantee -privacy, the minimum download cost is . This shows that for achieving -privacy there will be no loss in capacity even if only one linear combination of messages (instead of messages separately) is known to the user a priori.
For the model wherein the user’s demanded message contributes to their coded side information, we show that the scalar-linear capacity of the PIR-PCSI problem is equal to for any . Interestingly, this result shows that when the user knows messages (different from the demand), achieving -privacy is as costly as that when the user knows only one linear combination of the messages and the demand.
The converse proofs are based on information-theoretic arguments, and the proofs of achievability rely on novel PIR protocols based on the Generalized Reed-Solomon (GRS) codes that include a specific codeword.
II Problem Formulation
Let be a finite field of size , and let be an extension field of for some integer . Let , and let . For a positive integer , we denote by . Let and be two integers. We denote the set of all subsets of of size by , and the set of all sequences of length with elements from by .
Assume that there is a server that stores a set of messages , with each message being independently and uniformly distributed over , i.e., and . Also assume that there is a user that wishes to retrieve a message from the server for some , and knows a linear combination for some and . We refer to as the demand index, as the demand, as the side information index set, as the side information, and as the side information size.
We denote by , , and the random variables representing , , and , respectively. We also denote the probability mass function (PMF) of by , the PMF of by , and the conditional PMF of given by . We assume that is uniformly distributed over , i.e., for all ; and is uniformly distributed over , i.e., for all . Also, we consider two different models for the conditional PMF of given as follows:
Model I
is uniformly distributed over , i.e.,
[TABLE]
Model II
is uniformly distributed over , i.e.,
[TABLE]
To avoid the degenerate cases, we assume and for the models I and II, respectively.
Let be an indicator function such that if , and if . Note that is equal to if , and it is zero otherwise; and is equal to if , and it is zero otherwise.
We assume that is known to the server a priori. We also assume that the server knows the size of (i.e., ) and the PMF’s , , and , whereas the realizations , , and are unknown to the server a priori.
For any , , and , in order to retrieve , the user sends to the server a query , which is a (potentially stochastic) function of , , , and . The query must protect the privacy of both the user’s demand index and side information index set from the server’s perspective, i.e., for any given ,
[TABLE]
for all and all . We refer to this condition as the -privacy condition. Note that the -privacy condition is stronger than the -privacy condition being previously studied in [9], where the query must protect only the privacy of the user’s demand index, i.e., for any given , we have for all and all .
Upon receiving , the server sends to the user an answer , which is a (deterministic) function of the query , the indicator , and the messages in , i.e., . The answer along with the query , the indicator , and the side information must enable the user to retrieve the demand ,
[TABLE]
This condition is referred to as the recoverability condition.
For each model (I or II), the problem is to design a query and an answer for any , , and that satisfy the privacy and recoverability conditions. We refer to this problem as single-server single-message Private Information Retrieval (PIR) with Private Coded Side Information (PCSI), or PIR-PCSI for short. Specifically, we refer to the PIR-PCSI problem under the model I as PIR-PCSI–I, and under the model II as PIR-PCSI–II.
We refer to a collection of and (for all , , and such that or ) which satisfy the privacy and recoverability conditions as a PIR-PCSI–I protocol or a PIR-PCSI–II protocol, respectively.
The rate of a PIR-PCSI (–I or –II) protocol is defined as the ratio of the entropy of a message, i.e., , to the average entropy of the answer, i.e., , where the summation is over all , , and (such that or ). The capacity of PIR-PCSI (–I or –II) problem is defined as the supremum of rates over all PIR-PCSI (–I or –II) protocols. The supremum of rates over all scalar-linear PIR-PCSI (–I or –II) protocols, i.e., the answer contains only scalar-linear combinations of the messages, is defined as the scalar-linear capacity of PIR-PCSI (–I or –II) problem.
In this work, our goal is to characterize the capacity and the scalar-linear capacity of the PIR-PCSI–I and PIR-PCSI–II problems, and to design PIR-PCSI (–I and –II) protocols that are capacity-achieving.
III Main Results
We present our main results in this section. The capacity and the scalar-linear capacity of PIR-CSI–I problem are characterized in Theorem 1, and the scalar-linear capacity of PIR-CSI–II problem is characterized in Theorem 2. The proofs are given in Sections IV and V.
Theorem 1**.**
The capacity and the scalar-linear capacity of PIR-PCSI–I problem with messages and side information size are given by .
The converse follows directly from the result of [3, Theorem 2], which was proven using an index coding argument, for single-server single-message PIR with (uncoded) side information when -privacy is required. In this work, we provide an alternative proof by upper bounding the rate of any PIR-PCSI–I protocol using information-theoretic arguments (see Section IV-A). The key component of the proof is a necessary condition implied by the -privacy and recoverability conditions (see Lemma 1). The achievability proof relies on a new PIR-PCSI–I protocol, termed the Specialized GRS Code protocol, based on the Generalized Reed-Solomon (GRS) codes with a specific codeword, which achieves the rate (see Section IV-B).
Remark 1**.**
It was shown in [3] that when there is a single server storing messages, and there is a user that knows (uncoded) messages as their side information and demands a single message not in their side information, in order to guarantee the -privacy condition, the minimum download cost is . Surprisingly, this result matches the result of Theorem 1. This shows that for achieving -privacy there will be no loss in capacity even if only one linear combination of messages (instead of messages separately) is known to the user a priori.**
Remark 2**.**
When -privacy, which is a weaker notion of privacy in comparison to -privacy, is required (i.e., only the user’s demand index, and not the user’s side information index set, must be protected from the server), the result of [9, Theorem 1] shows that the capacity of single-server single-message PIR with a coded side information that does not include the demand (known as the PIR-CSI–I problem in [9]) is equal to . Since for all , the capacity of PIR-PCSI–I is strictly smaller than that of PIR-CSI–I, as expected. However, for the two extremal cases of and , it follows that -privacy comes at no extra cost than -privacy.**
Theorem 2**.**
The scalar-linear capacity of PIR-PCSI–II problem with messages and side information size is given by .
The converse proof is based on a mixture of algebraic and information-theoretic arguments (see Section V-A), and the proof of achievability is based on a modified version of the Specialized GRS Code protocol which achieves the rate (see Section V-B).
Remark 3**.**
Interestingly, comparing the results of [3, Theorem 2] and Theorem 2, one can see that when the user knows messages (different from the demand) separately, achieving -privacy is as costly as that when the user’s side information is only one linear combination of messages including the demand.**
Remark 4**.**
As shown in [9, Theorem 2], when -privacy is required, the capacity of single-server single-message PIR with a coded side information to which the demand message contributes (known as the PIR-CSI–II problem in [9]) is equal to for and , and is equal to for all . The result of Theorem 2 matches this result for the cases of and , and thereby, -privacy and -privacy are attainable at the same cost. For other cases of , as expected, achieving -privacy is more costly than achieving -privacy.**
IV The PIR-PCSI–I Problem
IV-A Converse for Theorem 1
Obviously, the capacity of PIR-PCSI–I is upper bounded by the capacity of PIR with uncoded side information where -privacy is required, which was shown to be in [3] using an index-coding argument, where uncoded messages are available at the user as side information. This proves the converse for Theorem 1. We present an alternative information-theoretic proof here.
The following result gives a necessary condition for -privacy and recoverability.
Lemma 1**.**
For any , , and where , and , and any and where , there must exist such that
[TABLE]
- Proof:
The proof is straightforward by the way of contradiction, and hence omitted. ∎
Lemma 2**.**
For any , the capacity of PIR-PCSI–I is upper bounded by .
- Proof:
Fix , , and (and accordingly, ) such that , and let and be the user’s query and the server’s answer, respectively, for an arbitrary PIR-PCSI-I protocol. We need to show that . Similar to the proof of [9, Theorem 1], it can be shown that
[TABLE]
If (i.e., ), then we have , as was to be shown. If , for any there exists (and accordingly, ) such that (by Lemma 1). Let be a maximal subset of such that and are linearly independent. (Note that .) Let . Then, we have
[TABLE]
where (2) holds because for all (by assumption); and (3) holds since is independent of (noting that and are disjoint). Note also that, by the maximality of , for any , there exists (and accordingly, , which is linearly dependent on ) such that , and subsequently, . (Note that .) Thus, we can write
[TABLE]
where (4) holds since for all (by assumption); and (5) holds because and are independent (noting that and are disjoint). Putting (1), (2), (3), and (5) together, it follows that , as was to be shown. ∎
IV-B Achievability for Theorem 1
In this section, we propose a PIR-PCSI–I protocol for arbitrary and that achieves the rate . Throughout, we assume that is sufficiently large, particularly . For arbitrary , the achievability of the rate , which is not necessarily feasible, is conditional on the existence of a maximum-distance-seperable (MDS) code over that includes a codeword with support such that the th codeword symbol is for , and is non-zero for .
Assume that , and let be distinct elements from .
Specialized GRS Code Protocol: This protocol consists of four steps as follows:
Step 1: The user first constructs a polynomial , and then constructs sequences , each of length , such that for , where for , and is a randomly chosen element from for .
For any , the th element, for any , in the sequence can be thought of as the entry of a matrix , which is the generator matrix of a GRS code with distinct parameters and non-zero multipliers [13]. The construction above ensures that such a GRS code has a specific codeword with support , namely , where the th codeword symbol is for , and is non-zero for .
Step 2: The user reorders by a randomly chosen permutation , and sends the query to the server.
Step 3: By using , the server computes for all where , and it sends the answer to the user.
Note that ’s are the parity check equations of a GRS code which is the dual code of the GRS code generated by the matrix defined earlier.
Step 4: Upon receiving the answer, the user retrieves by subtracting off the contribution of the side information from .
Lemma 3**.**
The Specialized GRS Code protocol is a PIR-PCSI–I protocol, and achieves the rate .
- Proof:
Since the matrix , defined in Step 1 of the protocol, generates a GRS code which is an MDS code, then the rows of are linearly independent, and accordingly, are linearly independent combinations of , which are themselves independently and uniformly distributed over . Thus, are independently and uniformly distributed over . Since , then , and for any , any , and any . Since the joint distribution of and is uniform and is uniformly distributed, then . Thus, the Specialized GRS Code protocol has the rate .
Next, we prove that the Specialized GRS Code protocol is a PIR-PCSI–I protocol. It should be obvious from the construction that the recoverability condition is satisfied. The -privacy condition is also satisfied because the GRS code, generated by the matrix , is an MDS code, and thereby, the minimum (Hamming) weight of a codeword is , and there are the same number of minimum-weight codewords for any support of size [13]. Thus, for any and any , the dual code, whose parity check matrix is given by , contains the same number of parity check equations (with support ) from each of which, given for some , can be recovered. ∎
V The PIR-PCSI–II Problem
V-A Converse for Theorem 2
In this section, we give an information-theoretic proof of converse for Theorem 2.
Lemma 4**.**
For any , the scalar-linear capacity of PIR-PCSI–II is upper bounded by .
- Proof:
Fix , , and (and ) such that . Let and be the query and the answer of an arbitrary scalar-linear PIR-PCSI–II protocol. We need to show that . Let be the set of all such that , i.e., is recoverable from (and ) directly. Let . There are two cases: (i) , and (ii) .
Case (i): Since and are independent and (by assumption), then
[TABLE]
If , then , and subsequently, , as was to be shown. If , can be further lower bounded as follows. Let . Assume, w.l.o.g., that . Let , and for . (Note that .) By Lemma 1, for any , there exists (and accordingly, ) such that . Let where is the coefficient of in . By the scalar-linearity of , it is easy to see that either or for some . (Otherwise, the server learns that the user’s demand index and side information index set cannot be and , respectively. This obviously violates the -privacy condition.) Thus, . Let . Then, we have
[TABLE]
where (7) holds since for all (by assumption); and (8) follows because is independent of , noting that , , and are linearly independent (by construction). By the linear independence of ’s for all , it follows that . By (6) and (8), we get .
Case (ii): Assume, w.l.o.g., that and . Let , and for . (Note that .) Similarly as in the case (i), define (and accordingly ) for all , where is replaced by . By using a similar argument as before, it can be shown that for all . Let . Then, we can write
[TABLE]
where (9) follows since (by the recoverability condition); (10) holds because , and subsequently, , for all ; and (11) follows because is independent of (due to the linear independence of , , and ). Since , we have (noting that ’s are linearly independent), and thereby, . ∎
V-B Achievability for Theorem 2
In this section, we propose a PIR-PCSI–II protocol, which is a slightly modified version of the Specialized GRS Code protocol, that achieves the rate for arbitrary and .
Modified Specialized GRS Code Protocol: This protocol consists of four steps, where the steps 2-4 are the same as those in the Specialized GRS Code protocol (Section IV-B), except that is replaced with everywhere. The step 1 of the proposed protocol is as follows:
Step 1: The user first constructs a polynomial , and then constructs sequences , each of length , such that for , where for ; where is chosen uniformly at random from ; and is a randomly chosen element from for .
Lemma 5**.**
The Modified Specialized GRS Code protocol is a PIR-PCSI–II protocol, and achieves the rate .
- Proof:
The proof, omitted to avoid repetition, follows from the same lines as in the proof of Lemma 3 where is replaced by , and is replaced by . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” IEEE Trans. on Info. Theory , vol. 63, no. 7, pp. 4075–4088, July 2017.
- 2[2] ——, “The capacity of robust private information retrieval with colluding databases,” IEEE Trans. on Info. Theory , vol. 64, no. 4, pp. 2361–2370, April 2018.
- 3[3] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 2017 55th Annual Allerton Conf. on Commun., Control, and Computing , Oct 2017, pp. 1099–1106.
- 4[4] R. Tandon, “The capacity of cache aided private information retrieval,” in 55th Annual Allerton Conf. on Commun., Control, and Computing , Oct 2017, pp. 1078–1082.
- 5[5] Y. Wei, K. Banawan, and S. Ulukus, “Cache-aided private information retrieval with partially known uncoded prefetching: Fundamental limits,” IEEE Journal on Selected Areas in Communications , vol. 36, no. 6, pp. 1126–1139, June 2018.
- 6[6] ——, “Fundamental limits of cache-aided private information retrieval with unknown and uncoded prefetching,” IEEE Trans. on Info. Theory , pp. 1–1, 2018.
- 7[7] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information,” Co RR , vol. abs/1709.00112, 2017. [Online]. Available: http://arxiv.org/abs/1709.00112
- 8[8] Z. Chen, Z. Wang, and S. Jafar, “The capacity of private information retrieval with private side information,” Co RR , vol. abs/1709.03022, 2017. [Online]. Available: http://arxiv.org/abs/1709.03022
