Private Information Retrieval with Private Coded Side Information: The Multi-Server Case
Fatemeh Kazemi, Esmaeil Karimi, Anoosheh Heidarzadeh, and Alex, Sprintson

TL;DR
This paper characterizes the maximum download rate for multi-server private information retrieval with private coded side information, considering whether the demand message is part of the side information or not.
Contribution
It derives the capacity bounds for multi-server PIR with private coded side information, extending previous single-server results and introducing new achievable schemes.
Findings
Capacity for side information excluding demand message: (1+1/N+...+1/N^{K-M-1})^{-1}
Lower bound on capacity when demand message is part of side information: (1+1/N+...+1/N^{K-M})^{-1}
Utilizes techniques from single-server PIR-PCSI and multi-server private computation literature.
Abstract
In this paper, we consider the multi-server setting of Private Information Retrieval with Private Coded Side Information (PIR-PCSI) problem. In this problem, there is a database of messages whose copies are replicated across servers, and there is a user who knows a random linear combination of a random subset of messages in the database as side information. The user wishes to download one message from the servers, while protecting the identities of both the demand message and the messages forming the side information. We assume that the servers know the number of messages forming the user's side information in advance, whereas the indices of these messages and their coefficients in the side information are not known to any of the servers a priori. Our goal is to characterize (or derive a lower bound on) the capacity, i.e., the maximum achievable download rate, for theā¦
| S1 | S2 |
|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\xpatchcmd
- Proof:
Private Information Retrieval with Private Coded Side Information: The Multi-Server Case
Fatemeh Kazemi, Esmaeil Karimi, Anoosheh Heidarzadeh, and Alex Sprintson The authors are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (E-mail: {fatemeh.kazemi, esmaeil.karimi, anoosheh, spalex}@tamu.edu).
Abstract
In this paper, we consider the multi-server setting of Private Information Retrieval with Private Coded Side Information (PIR-PCSI) problem. In this problem, there is a database of messages whose copies are replicated across servers, and there is a user who knows a random linear combination of a random subset of messages in the database as side information. The user wishes to download one message from the servers, while protecting the identities of both the demand message and the messages forming the side information. We assume that the servers know the number of messages forming the userās side information in advance, whereas the indices of these messages and their coefficients in the side information are not known to any of the servers a priori.
Our goal is to characterize (or derive a lower bound on) the capacity, i.e., the maximum achievable download rate, for the following two settings. In the first setting, the set of messages forming the linear combination available to the user as side information, does not include the userās demanded message. For this setting, we show that the capacity is equal to . In the second setting, the demand message contributes to the linear combination available to the user as side information, i.e., the demand message is one of the messages that form the userās side information. For this setting, we show that the capacity is lower-bounded by . The proposed achievability schemes and proof techniques leverage ideas from both our recent methods proposed for the single-server PIR-PCSI problem as well as the techniques proposed by Sun and Jafar for multi-server private computation problem.
I Introduction
In the Private Information Retrieval (PIR) problem, a database of messages are replicated at servers. There is a user who wishes to retrieve a single or multiple messages belonging to the database while protecting the identity of the demanded message(s) from any individual serverĀ [1, 2, 3, 4]. In order to retrieve the desired message(s), the user generates one query for each server. Upon receiving the userās query, each server will return an answer to the user, which depends on the stored messages and the received query. To ensure that each server learns nothing about the identity of the message(s) being retrieved by the user, in an information theoretic sense, each query must be marginally independent of the desired message(s) index.
In a single-server setting or a multi-server setting when all servers can fully collude, the user must download the whole database to achieve privacy in the information-theoretic senseĀ [1]. However, when the user has some side information about the messages in the database [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] or when the servers do not fully colludeĀ [2, 3, 4], the privacy can be achieved in a more efficient manner in terms of minimizing the download cost (i.e., the amount of information downloaded from the server(s)).
For the PIR problem in the presence of side information, two different types of privacy can be considered: (i)Ā -privacy, which requires that the identity of the userās demanded message(s) be protected, and (ii)Ā -privacy, which requires that the identities of both the userās demanded message(s) and the message(s) in the userās side information be protected. When the side information is a random subset of messages, the problem is referred to as PIR with Side Information (PIR-SI) or PIR with Private Side Information (PIR-PSI) where -privacy or -privacy is required, respectively. The single-server settings of these problems were studied inĀ [5, 6, 7], and their multi-server settings were studied inĀ [9, 8, 10]. InĀ [11] and [12], we studied the single-server setting of a related problem in which the side information is a random linear combination of a random subset of messages. This problem is referred to as PIR with Coded Side Information (PIR-CSI) or PIR with Private Coded Side Information (PIR-PCSI) when -privacy or -privacy is required, respectively. Also, in [13], we recently studied the multi-server setting of the PIR-CSI problem.
In this work, we consider the multi-server setting of the PIR-PCSI problem. In this setting, a database of messages is replicated across servers, and a user, who knows a random linear combination of a random subset of messages in the database, wishes to obtain a message by sending queries to the servers. The goal is to design a scheme that protects the identities of both the userās demanded message and the messages forming the userās side information, while minimizes the download cost. The servers are assumed to know the number of messages contributing to the userās side information beforehand. However, the indices and the coefficients of the messages in the userās side information are not known to the servers in advance. The motivation for this type of side information comes from several practical scenarios. For instance, the side information could have been obtained in advance from a trusted server with limited knowledge about the database, or through overhearing in a wireless network, or from the information locally stored in the userās cache.
I-A Main Contributions
We consider two settings of the PIR-PCSI problem depending on whether the userās demanded message is one of the messages forming the userās side information or not. We characterize (or derive a lower bound on) the capacity of each setting, where the capacity is defined as the supremum of all achievable rates (i.e., the inverse of the normalized download cost). In the first setting, the message demanded by the user is not one of the messages forming the userās side information. For this setting, we prove that the capacity is equal to . Interestingly, the capacity in this setting is equal to the capacity of multi-server PIR-PSI problemĀ [8] in which uncoded messages are available at the user as side information. This result shows that there is no loss in capacity due to restricting the userās side information to one random linear combination of messages, instead of uncoded messages.
The converse proof readily follows from the fact that the capacity of this setting is upper-bounded by the capacity of the multi-server PIR-PSI which is given by (seeĀ [8, TheoremĀ 1]).
For the achievability proof, we devise a new protocol that builds upon two existing achievability schemes for two different problems: (i) the Private Computation (PC) scheme ofĀ [19] for multi-server private computation where a user wishes to privately retrieve one arbitrary linear combination of the messages replicated at multiple servers, and (ii) our Specialized GRS Code scheme proposed inĀ [12] for single-server PIR-PCSI.
The main ideas of our achievability scheme are as follows. First, the user utilizes the Specialized GRS Code scheme ofĀ [12] for single-server PIR-PCSI to construct independent super-messages which are some linearly independent combinations of the original messages, to play the role of the original messages in a multi-server private computation problem. Then, the user and the servers leverage the PC scheme ofĀ [19] for the constructed super-messages in such a way that the user can privately download one of linear combinations of the super-messages where the support of each linear combination is a distinct subset of of size .
Additionally, for the setting wherein the demanded message is one of the messages forming the userās side information, we show that the capacity is lower-bounded by . The proof is based on a new achievability scheme that leverages the PC scheme ofĀ [19] for multi-server private computation, combined with our Modified Specialized GRS Code scheme proposed inĀ [12] for single-server PIR-PCSI.
II Problem Formulation
We denote random variables by bold letters and their realizations by non-bold letters. For a positive integer , let . Let be a finite field for some prime , and let be the multiplicative group of . Let be an extension field of for some integer .
Consider non-colluding identical servers, each of which stores messages , where is independently and uniformly distributed over , i.e., for all , it holds that
[TABLE]
Suppose that there is a user that wishes to retrieve a message from the servers for some , and has a linear combination for some and some , where is the set of all -subsets of , and is the set of all length- sequences with elements from . We call the demand index, the demand, the side information, the side information index set, and the side information size.
We assume that is uniformly distributed over , and that is uniformly distributed over . Also, two different models for the conditional distribution of given are considered:
- ā¢
Model I: is uniformly distributed over ;
- ā¢
Model II: is uniformly distributed over .
It is assumed that and for Model I and Model II, respectively. Note that for both models it holds that is uniformly distributed over . We assume that no server knows the realizations of in advance. In contrast, we assume that all servers know the considered model (i.e., whether or ), the side information size , the distributions of and , and the conditional distribution of given .
For any , , , in order to retrieve , the user generates queries , and sends to the -th server the query . Each query is assumed to be a (potentially stochastic) function of , , , and . Upon receiving the query , the -th server responds to the user with an answer . The answer is a (deterministic) function of the query and the messages in . Note that for all , it holds that
[TABLE]
forms a Markov chain, and
[TABLE]
The answers from all servers along with the side information and the queries must enable the user to retrieve the demand , i.e.,
[TABLE]
where , and . This condition is referred to as the recoverability condition.
In addition, the queries must not reveal any information about the userās demand index and side information index set to any server,
[TABLE]
This condition is referred to as the -privacy condition.
For both models (Model I and Model II), we would like to design a protocol for generating queries for any given . The protocol also prescribes, for all , how the -th server generates the answer , given and .
A protocol that satisfies both the -privacy and recoverability conditions for all with (or ), is referred to as a PIR-PCSIāI (or PIR-PCSIāII) protocol. The problem of designing a PIR-PCSIāI (or PIR-PCSIāII) protocol is referred to as the PIR-PCSIāI (or PIR-PCSIāII) problem.
The rate of a PIR-PCSIāI or PIR-PCSIāII protocol is defined as the ratio of the entropy of a message, i.e., , to the total entropy of answers from all servers, i.e., .
The capacity of the PIR-PCSIāI (PIR-PCSIāII) problem is defined as the supremum of rates over all PIR-PCSIāI (PIR-PCSIāII) protocols. We denote by the capacity of the PIR-PCSIāI problem, and denote by the capacity of the PIR-PCSIāII problem.
In this work, our goal is to characterize (or derive lower bounds on) and , and to design PIR-PCSIāI and PIR-PCSIāII protocols that achieve the capacity (or the derived lower bound on the capacity).
III Main Results
In this section, we present our main results. TheoremĀ 1 characterizes the capacity of the PIR-PCSIāI problem , and TheoremĀ 2 presents a lower-bound on the capacity of the PIR-PCSIāII problem . The proofs of theoremsĀ 1 andĀ 2 are given in sectionsĀ IV andĀ V, respectively.
Theorem 1**.**
The capacity of the PIR-PCSIāI problem with servers, messages, and side information size is given by
[TABLE]
Interestingly, this result indicates that the capacity of multi-server PIR-PCSIāI, i.e., , is equal to the capacity of the multi-server PIR-PSIĀ [8] where uncoded messages are available at the user as side information. Note that having only a random linear combination of messages as side information instead of uncoded messages, cannot increase the capacity which implies the converse. Thus, to complete the proof of TheoremĀ 1, we only need to prove the achievability which is presented in SectionĀ IV. Notably, our results show that having only one random linear combination of messages instead of multiple uncoded messages does not decrease the capacity, either.
Theorem 2**.**
The capacity of the PIR-PCSIāII problem with servers, messages, and side information size is lower-bounded by
[TABLE]
This result is interesting because it shows that the lower-bound on the capacity of the multi-server PIR-PCSIāII is the same as the capacity of multi-server PIR-SI when the size of side information is . That is, having a side information which is only a random linear combination of messages including the demanded message would be at least as effective as knowing messages separately in terms of minimizing the download cost. For the proof, we construct a PIR-PCSIāII protocol that achieves the capacity lower-bound of TheoremĀ 2. It should be noted that the tightness of this lower bound remains open in general.
IV The Ā PIR-PCSI-I Ā Problem
In this section, we complete the proof of TheoremĀ 1 by proposing an achievability scheme for arbitrary , and that achieves the rate . The proposed protocol, referred to as the Multi-Server PIR-PCSIāI protocol, is a non-trivial combination of the Specialized GRS Code scheme ofĀ [12] for single-server PIR-PCSI problem and the Private Computation (PC) scheme ofĀ [19] for multi-server private computation problem.
For the proposed protocol, we assume that , and each message consists of symbols over .
Multi-Server PIR-PCSIāI protocol: The protocol consists of the following five steps:
Step 1: The user utilizes the Specialized GRS Code protocol proposed in [12] to first construct a polynomial where are arbitrarily chosen distinct elements from , and then construct vectors , each of length , such that for , where for , and is a randomly chosen element from for .
Step 2: Let for . Each is referred to as a super-message. Note that the vector (constructed in StepĀ 1) is the vector of coefficients of the messages in the super-message . Let , and let be the collection of all -subsets of in a lexicographical order. The structure of the Specialized GRS Code protocol [12] ensures that for each , , there exist exactly linear combinations of the messages with (non-zero) coefficients from , such that for every , can be written as a linear combination of the super-messages . Let be a vector of length such that . Note that, for each , are the same up to a scalar multiple, i.e., for each , , or equivalently, , for some distinct . For each , let . Note also that for every , there exists a unique such that the coefficient of the message in the linear combination is equal to . The user then constructs vectors , each of length , such that . (Note that the above procedure dictates a specific choice of the coefficient vectors . However, for each , the vector can be chosen arbitrarily from the set of vectors .) Let for . Each is referred to as a (linear) function. Note that is the vector of coefficients of the super-messages in the function .
Step 3: The user sends to all servers the vectors (associated with the super-messages ), and the vectors (associated with the functions ). It is noteworthy that the user needs only to send the vectors to all servers, and each server can construct the vectors by using (according to the procedure described in StepĀ 2).
Step 4: The user and the servers leverage the PC scheme ofĀ [19] with (independent) messages and (linear) functions of these messages in order for the user to privately retrieve one of these functions. In particular, the super-messages and the functions play the role of the original messages and the functions in the PC scheme, respectively, and the user is interested in retrieving the function privately, where is an -linear combination (i.e., a linear combination with non-zero coefficients only) of the messages . (By the construction, there exists one (and only one) function among such that is an -linear combination of the messages .) To be more specific, each server first constructs the super-messages by using the coefficient vectors (defined in StepĀ 3), and then constructs the functions by using the super-messages and the coefficient vectors (defined in StepĀ 3). Note that each function for consists of -symbols where is the number of servers. Then, each server sends to the user carefully designed linear combinations of all -symbols associated with all functions . The details of the design of the userās query to each server as well as the linear combinations transmitted by each server (which also depend on the query of the user) can be found inĀ [19, SectionĀ 4].
Example 1. (Multi-Server PIR-PCSIāI protocol) Assume that there are servers, messages from , and . Note that each message consists of symbols from . Suppose that the user demands the message and has a coded side information , i.e., , , and (i.e., ).
First, the user picks distinct elements from . Suppose that the user chooses , , , . Then, the user constructs the polynomial . The user then computes for , i.e., and , by setting and , and chooses for , i.e., and , at random (from ). Assume that the user chooses and . Then, the user constructs vectors and , each of length , such that for . That is, the user constructs and . For set , there exist exactly vectors for such that .
It should be noted that there exists no other vector such that the support of the vector is . Note that the coefficient of the message (i.e., ) in the function is equal to when . Thus, the user constructs the vector . Similarly, the user constructs the vectors , and . Then, the user sends to all servers the vectors and (associated with the super-messages and ), and the vectors (associated with the functions ). Using the coefficient vectors and , each server first constructs the two super-messages and . Then, it constructs the functions using the super-messages and and the coefficient vectors as follows:
[TABLE]
Finally, the user and the servers apply the PC scheme ofĀ [19] for two super-messages , in order for the user to privately retrieve the function . (Note that among the functions , only is an -linear combination of the messages .) The details of the PC scheme for this example are as follows. Let be a randomly chosen permutation. Let for and , where is the -th -symbol of , and is a randomly chosen element from . For simplifying the notation, let for all . The user then queries carefully designed linear combinations of the symbols , as given in TableĀ I [19], from each of the servers (S1 and S2).
As shown inĀ [19], among the symbols queried from S1 (or S2), based on the information obtained from S2 (or S1), symbols are redundant. For instance, consider the symbols queried from S1. (Similar observations can be made regarding the queries from S2.) Among the symbols , any symbols suffice to recover the other symbols. For example, and can be obtained from and . (Note that and can be written as a linear combination of and .) Thus, the server S1 needs to send two arbitrary symbols from . In addition, given any symbols from , any symbols among the symbols queried from S1 would suffice to recover the remaining symbol. For example, can be obtained from the symbols (for details, seeĀ [19, SectionĀ 5.1]). Thus, each of the servers S1 and S2 needs to send to the user only symbols. In particular, S1 transmits arbitrary symbols from , arbitrary symbols from , and the symbols , and the symbol ; and S2 transmits arbitrary symbols from , arbitrary symbols from , and the symbols , and the symbol .
From the answers by the servers, the user obtains all symbols , and accordingly, all symbols of . (Note that for .) From (), the user can decode the desired message by subtracting off the contribution of their side information .
In order to retrieve which consists of symbols (over ), according to the proposed protocol, the user downloads symbols (over ) from both servers, and hence the rate of the proposed protocol is .
Note that for every -subset of the messages , in the proposed protocol there exists one (and only one) linear combination for some of the messages . On the other hand, the PC scheme guarantees that no server can obtain any information about the index () of the linear combination being requested by the user. Thus, the proposed scheme satisfies the -privacy condition, as desired.
Lemma 1**.**
The Multi-Server PIR-PCSIāI protocol satisfies the recoverability and (W,S)-privacy conditions, and achieves the rate .
- Proof:
Since the messages are uniformly and independently distributed over , and are linearly independent combinations of the messages in , thus are uniformly and independently distributed over as well, i.e., . Hence, the rate of the Multi-Server PIR-PCSIāI protocol is the same as the rate of the PC protocol for servers and messages, which is given by (see [19, TheoremĀ 1]).
From the step of the Multi-Server PIR-PCSIāI protocol, it is evident that the recoverability condition is satisfied. The proof of the -privacy of the proposed protocol is as follows. The PC protocol protects the privacy of the function (linear combination) requested by the user. That is, given the query, no server can obtain any information about the index of the function requested by the user. Consider an arbitrary server , and an arbitrary query to server , generated by the proposed protocol. Thus, given , from the perspective of server , every function for is equally likely to include the demanded message. We denote the support of by , i.e., is the set of all indices such that has a non-zero coefficient in the linear combination . Thus, for all , we have
[TABLE]
noting that . Note that any given index is in the support of exactly functions , . For any given , given and , from the perspective of server , every index is equally likely to be the demand index. That is, for all , we have
[TABLE]
Furthermore, for any given and , we have
[TABLE]
Consider arbitrary and . Let be the (unique) index such that . It is easy to see that for all . Thus, by usingĀ (1)-(3), we can write
[TABLE]
On the other hand, we have
[TABLE]
FromĀ (4) andĀ (5), for any and , we have
[TABLE]
This completes the proof of -privacy of the proposed protocol. ā
V The Ā PIR-PCSI-II Ā Problem
In this section, we prove the result of TheoremĀ 2 by constructing a PIR-PCSIāII protocol, referred to as the Multi-Server PIR-PCSIāII protocol, for arbitrary , and that achieves the rate .
For the proposed protocol, we assume that , and each message is comprised of symbols over .
Multi-Server PIR-CSIāII protocol: The protocol consists of four steps, where the steps 2-4 are the same as the steps 2-4 in the Multi-Server PIR-PCSIāI protocol, except that is replaced with everywhere. The stepĀ 1 of the proposed protocol is as follows:
Step 1: The user utilizes the Modified Specialized GRS Code protocol proposed in [12] to first construct a polynomial where are arbitrarily chosen distinct elements from , and then construct vectors , each of length , such that for , where for , where is chosen uniformly at random from , and is a randomly chosen element from for .
Lemma 2**.**
The Multi-Server PIR-PCSIāII protocol satisfies the recoverability and (W,S)-privacy conditions, and achieves the rate .
- Proof:
The proof is similar to the proof of Lemma 1, and hence omitted to avoid repetition. ā
VI Conclusion
In this paper, we studied the multi-server setting of the Private Information Retrieval with Private Coded Side Information (PIR-PCSI) problem. In this problem, there is a database of messages replicated across servers, and there is a user who initially has a random linear combination of a random subset of messages in the database as side information. The goal of the user is to retrieve one message from the servers, while protecting the identities of both the demand message and the side information messages jointly. We considered two different models for this problem depending on whether the side information is a function of the demand message or not. First, we focused on the setting in which the side information is not a function of the demand message. For this setting, we proved that the capacity is given by . Then, we considered the setting in which the side information is a function of the demand message. For this setting, we show that the capacity is lower-bounded by . Our proposed achievability schemes are inspired by our recently proposed scheme for the single-server PIR-PCSI problem in conjunction with the scheme proposed by Sun and Jafar for multi-server private computation problem.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, āPrivate information retrieval,ā in Proc. IEEE 36th Annual Foundations of Computer Science , 1995, pp. 41ā50.
- 2[2] H. Sun and S. A. Jafar, āThe capacity of private information retrieval,ā IEEE Transactions on Information Theory , vol. 63, no. 7, pp. 4075ā4088, July 2017.
- 3[3] K. Banawan and S. Ulukus, āThe capacity of private information retrieval from coded databases,ā IEEE Transactions on Information Theory , vol. 64, no. 3, pp. 1945ā1956, March 2018.
- 4[4] H. Sun and S. A. Jafar, āThe capacity of robust private information retrieval with colluding databases,ā IEEE Transactions on Information Theory , vol. 64, no. 4, pp. 2361ā2370, April 2018.
- 5[5] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson, āPrivate information retrieval with side information: The single server case,ā in 2017 55th Annual Allerton Conference on Communication, Control, and Computing , Oct 2017, pp. 1099ā1106.
- 6[6] A. Heidarzadeh, B. Garcia, S. Kadhe, S. El Rouayheb, and A. Sprintson, āOn the capacity of single-server multi-message private information retrieval with side information,ā in 2018 56th Annual Allerton Conference on Communication, Control, and Computing , Oct 2018, pp. 180ā187.
- 7[7] S. Li and M. Gastpar, āSingle-server multi-message private information retrieval with side information,ā in 2018 56th Annual Allerton Conference on Communication, Control, and Computing , Oct 2018, pp. 173ā179.
- 8[8] Z. Chen, Z. Wang, and S. Jafar, āThe capacity of private information retrieval with private side information,ā Sep 2017. [Online]. Available: http://arxiv.org/abs/1709.03022
