On an Equivalence Between Single-Server PIR with Side Information and Locally Recoverable Codes
Swanand Kadhe, Anoosheh Heidarzadeh, Alex Sprintson, and O. Ozan, Koyluoglu

TL;DR
This paper reveals a fundamental equivalence between single-server PIR with side information and locally recoverable codes, enabling new bounds and insights into both areas.
Contribution
It establishes a novel equivalence between PIR schemes with side information and locally recoverable codes, including cooperative variants, providing new bounds and theoretical insights.
Findings
PIR schemes for single message retrieval are equivalent to classical LRCs.
PIR schemes for multiple message retrieval are equivalent to cooperative LRCs.
Derived new upper bounds on download rates for PIR-SI and cooperative LRCs.
Abstract
Private Information Retrieval (PIR) problem has recently attracted a significant interest in the information-theory community. In this problem, a user wants to privately download one or more messages belonging to a database with copies stored on a single or multiple remote servers. In the single server scenario, the user must have prior side information, i.e., a subset of messages unknown to the server, to be able to privately retrieve the required messages in an efficient way. In the last decade, there has also been a significant interest in Locally Recoverable Codes (LRC), a class of storage codes in which each symbol can be recovered from a limited number of other symbols. More recently, there is an interest in 'cooperative' locally recoverable codes, i.e., codes in which multiple symbols can be recovered from a small set of other code symbols. In this paper, we establish a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On an Equivalence Between Single-Server PIR with Side Information and Locally Recoverable Codes
Swanand Kadhe, Anoosheh Heidarzadeh, Alex Sprintson, and O. Ozan Koyluoglu
Abstract
Private Information Retrieval (PIR) problem has recently attracted a significant interest in the information-theory community. In this problem, a user wants to privately download one or more messages belonging to a database with copies stored on a single or multiple remote servers. In the single server scenario, the user must have prior side information, i.e., a subset of messages unknown to the server, to be able to privately retrieve the required messages in an efficient way.
In the last decade, there has also been a significant interest in Locally Recoverable Codes (LRC), a class of storage codes in which each symbol can be recovered from a limited number of other symbols. More recently, there is an interest in cooperative locally recoverable codes, i.e., codes in which multiple symbols can be recovered from a small set of other code symbols.
In this paper, we establish a relationship between coding schemes for the single-server PIR problem and LRCs. In particular, we show the following results: (i) PIR schemes designed for retrieving a single message are ‘equivalent’ to classical LRCs; and (ii) PIR schemes for retrieving multiple messages are equivalent to cooperative LRCs. These equivalence results allow us to recover upper bounds on the download rate for PIR-SI schemes, and to obtain a novel rate upper bound on cooperative LRCs. We show results for both linear and non-linear codes.
†† S. Kadhe and O. O. Koyluoglu are with the Department of Electrical Engineering and Computer Sciences at University of California Berkeley, USA; emails: {swanand.kadhe, ozan.koyluoglu}@berkeley.edu. A. Heidarzadeh and A. Sprintson are with the Department of Electrical and Computer Engineering at Texas A&M University, USA; emails:{anoosheh, spalex}@tamu.edu. This work is supported in part by National Science Foundation grants CCF-1748585 and CNS-1748692. This material is based upon work supported while Alex Sprintson was serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
I Introduction
The Private Information Retrieval (PIR) problem is one of the important problems in theoretical computer science [1]. The setting of the problem includes a client that needs to retrieve a message belonging to a database with copies stored on a single or multiple remote servers. The message needs to be retrieved by satisfying the privacy condition, which prevents the server from identifying the index of the retrieved message. The theoretical computer science community has primarily focused on the settings with small message sizes with the objective to minimize the total number of bits uploaded to and downloaded from the server (see [2]).
Starting with the seminal work of Sun and Jafar [3], the multiple-server PIR problem has received a significant attention from the information and coding theory community with breakthrough results in the past few years (see, e.g., [4, 5, 6, 7], and references therein). The information-theoretic approach has focused on a practical setting with large message sizes with the goal to minimize the ratio of the total number of downloaded bits to the message size.
Recently, Kadhe et al. [8, 9] considered the single-server PIR with Side Information (PIR-SI) problem, wherein the user knows a random subset of messages that is unknown to the server. It was shown that the side information enables the user to substantially reduce the download cost and still achieve information-theoretic privacy for the requested message. The multi-message extension of PIR-SI, which enables a user to privately download multiple messages from the server, is considered by Heidarzadeh et al. [10] as well as Li and Gastpar [11].
It is well-known in the theoretical computer science community that there is a strong relationship between PIR schemes and a class of error-correcting codes called locally decodable codes (LDCs) (see, e.g., the surveys [2, 12]). LDCs allow one to locally decode an arbitrary message symbol from only a small subset of randomly chosen codeword symbols, even after a fraction of codeword symbols are corrupted by an adversary.
Continuing with this theme, in this paper, we show that single-server PIR-SI schemes are closely related to another class of codes with locality called locally recoverable codes (LRCs) [13]. LRCs are a class of erasure codes that enable one to recover an erased codeword symbol from only a small subset of other codeword symbols.
In particular, in an LRC with block-length and locality , every codeword symbol can be reconstructed from at most other codeword symbols [13]. Rawat et al. [14, 15] extended the notion of local recovery to cooperative local recovery. Specifically, in an LRC with block-length and -cooperative locality, every subset of codeword symbols can be reconstructed from at most other codeword symbols.
In this paper, we show that single-message PIR-SI schemes are related to LRCs, whereas multi-message PIR-SI schemes are related to cooperative LRCs. Detailed contributions are outlined in the following.
Our Contributions: We focus out attention to the single- server PIR-SI problem in which a user wishes to download messages from a database of messages (over a finite field ), stored on a single remote server. The user has a random subset of messages, referred to as side information, whose identities are unknown to the server.
First, we focus on the scalar-linear case wherein the answer from the server is of the form , where denotes the set of messages, and is a matrix with entries over . When the user wishes to protect only the identities of the requested messages, we show the following results:
- •
Equivalence between single-message PIR with Side Information (SM-PIR-SI) schemes and LRCs:
Any solution to an SM-PIR-SI problem is a parity check matrix of an LRC with block-length and locality (Theorem 1). 2. 2.
Given a parity check matrix of an LRC with block-length and locality , it is possible to construct an SM-PIR-SI scheme where is a column-permutation of (Theorem 2).
- •
Equivalence between multi-message PIR with Side Information (MM-PIR-SI) schemes and cooperative LRCs:
Any solution to a MM-PIR-SI problem is a parity check matrix of an LRC with block-length and -cooperative locality (Theorem 3). 2. 2.
Given a parity check matrix of an LRC with block-length and -cooperative locality, it is possible to construct an MM-PIR-SI scheme where is a column-permutation of (Theorem 4).
- •
As corollaries to Theorems 1 and 3, we derive upper bounds on the download rates for SM-PIR-SI problem (Corollary 1) and MM-PIR-SI problem (Corollary 3), respectively. In addition, as a corollary to Theorem 4, we derive a novel tight upper bound on the rate of a cooperative LRC for the regime (see Corollary 4 and Remark 2).
Next, we consider the case when the user wants to protect both the identities of the requested messages and that of the side-information, referred to as -PIR-SI.111Here, denotes the demand index set and denotes the side information index set. We use the term -PIR-SI to reflect the fact that the user wants to protect jointly. We show the following equivalence result:
- •
Equivalence between -PIR-SI schemes and maximum distance separable (MDS) codes222An MDS code can be considered as an LRC with locality .:
Any solution to a -PIR-SI problem is a parity check matrix of an MDS code with block-length and dimension (Theorem 5). 2. 2.
Given a parity check matrix of an MDS code with block-length and dimension , it is possible to construct a -PIR-SI scheme where (Theorem 6).
Finally, we lift the restriction of scalar-linear solutions, and consider generic (non-linear) SM-PIR-SI schemes. We show the following equivalence result:
- •
Equivalence between SM-PIR-SI schemes and LRCs with maximum possible size333It is possible to show that any LRC over with block-length and locality can contain at most codewords (see Proposition 2). Any LRC with codewords is said to be an LRC code with maximum possible size.:
Given a solution to an SM-PIR-SI problem, it is possible to construct an LRC with block-length and locality (Theorem 7). 2. 2.
Given an LRC with block-length and locality with the maximum possible size, it is possible to construct an SM-PIR-SI scheme (Theorem 8).
II Preliminaries
Notation: For a positive integer , denote by . Let denote the finite field of order , where is a power of a prime. For a set and a subset , let . For a positive integer , let and , respectively, denote the all-one and all-zero row vectors of length . Let be a unit vector of length such that its -th entry is and the other entries are [math]. For a set , let
[TABLE]
For a matrix , let denote the row-space of . For a subset , let denote the submatrix consisting of columns of indexed by . For a vector , let denote the support of . For a subspace , let be its dual subspace.
II-A Single-Server PIR with Side Information
We briefly overview the single-server PIR with side information problem [8, 16] (see also [9]). Consider a server containing a database that consists of a set of messages , with each message being independently and uniformly distributed over . A user is interested in privately downloading () messages from the server for some , . We refer to as the demand index set and as the demand. The user has the knowledge of a subset of the messages for some , , . We refer to as the side information index set and as the side information.
Let and denote the random variables corresponding to the demand and side information index sets, respectively. We assume that the side information index set is distributed uniformly over over all subsets of of size , i.e.,
[TABLE]
Further, we assume that the demand index set has the following conditional distribution given :
[TABLE]
We assume that the server does not know the side information realization at the user and only knows the a priori distributions and .
To download the set of messages given the side information , the user sends a query to the server. The server responds to the query it receives with an answer over . Let and be the corresponding random variables.
Definition 1**.**
[PIR-SI] Any scheme consisting of a query and an answer is referred to as the PIR with side information (PIR-SI) scheme if the query and answer satisfy the following two conditions.
-privacy: The server cannot infer any information about the demand index set from the query it receives i.e.,
[TABLE]
- 2.
*-privacy: The server cannot infer any information about the demand index set as well as the side information index set from the query it receives i.e., *
[TABLE]
- 3.
Recoverability:* From the answer and the side information , the user should be able to decode the desired set of messages for any , i.e.,*
[TABLE]
We refer to the case of as single-message PIR-SI, while the case of as multi-message PIR-SI.
The rate of a PIR-SI scheme is defined as the ratio of the message length ( bits) to the total length of the answers (in bits) as follows:444We focus our attention to the download rate similar to [3]. This is because the download rate dominates the total communication rate when the message size is sufficiently large as compared to the size of a query.
[TABLE]
The capacity of -PIR-SI, denoted by , is defined as the supremum of rates over all -PIR-SI schemes for a given and .
II-B Locally Recoverable Codes
Let denote a linear code over with block-length , dimension , and minimum distance . For any codeword , is said to be the -th symbol of the codeword .
We say that the -th symbol of a code has locality if its value can be recovered from some other symbols of . The formal definition of locality is as follows (see [13]).
Definition 2**.**
[Locality] We say that the -th coordinate of a code has locality if there exists a set , , such that, for every codeword , , where . We say that is a repair group of the -th coordinate and define .
We say that an code has (all-symbol) locality if each of its coordinates has locality . An LRC with these parameters is referred to as an LRC.
Equivalently, we say that the coordinate has locality , if the dual code contains a codeword of Hamming weight at most such that the -th coordinate is in the support of .
Example 1**.**
Let us consider a Simplex code , which is a dual of a Hamming code. In particular, encodes three information symbols into seven symbols as . It is easy to see that any symbol can be recovered from two other symbols. For instance, can be recovered from and .555In fact, every symbol of the simplex code has three disjoint repair groups [17]. Further, note that, even though the simplex code is not optimal with respect to the distance upper bound in (7), it is optimal with respect to a field size dependent rate upper bound established in [17].
In [13], it is shown that the minimum distance of an LRC is upper bounded as
[TABLE]
Further, the authors of prove that any systematic code with locality for information symbols that achieves equality in (7) must follow a specific structure [13]. We state below the structure theorem [13, Theorem 9], adapted to the form useful for our setup.
Proposition 1**.**
[13]** Let be an code, where , , and . Then, for any , , we have either or .
II-C Cooperative Locally Recoverable Codes
Let denote a linear code over with block-length , dimension , and minimum distance . We say that the code has -cooperative locality if for every codeword, it is possible to repair any symbols from at most other symbols. The formal definition is as follows (see [14]).
Definition 3**.**
We say that an code has -cooperative locality, if for any subset of coordinates , , there exists a set satisfying , , such that, for every codeword , the symbols can be recovered using the symbols .
An LRC with these parameters is referred to as an cooperative LRC. Note that when and , then the above definition coincides with that of an MDS code.
In [15], it is shown that the minimum distance of an cooperative LRC for is upper bounded as
[TABLE]
III Equivalence Results for Scalar-Linear Schemes
In this section, we consider non-interactive (single round), scalar-linear PIR-SI schemes. In particular, for any given query , the answer can be specified as
[TABLE]
where the matrix depends on . We refer to as a solution to the PIR-SI problem. Note that , the number of rows of , denotes the number of symbols downloaded from the server.
III-A Single-Message PIR-SI Schemes and LRCs
In this section, we show that a single-message PIR-SI scheme is equivalent to a locally recoverable code (LRC). In particular, we show that any solution to the single-message PIR-SI problem (SM-PIR-SI) must be a parity check matrix of an LRC. Furthermore, we show that it is possible to construct a solution to the SM-PIR-SI problem using a parity check matrix of an LRC.
First, we establish the relation from a solution of the SM-PIR-SI problem to a parity check matrix of an LRC.
Theorem 1**.**
Any scalar-linear solution to the single-message PIR-SI problem must be a parity check matrix of an LRC with block length and locality .
Proof:
First, we note that the following necessary condition is imposed by the privacy and recoverability conditions. For any query , the answer should satisfy the following necessary condition: for any candidate demand index , there must exist a potential side information index set , such that it is possible to recover from and . In other words, the following condition must hold:
[TABLE]
If the aforementioned necessary condition does not hold, then the server will learn from that is not the user’s demand index. Indeed, since is the solution corresponding to the query , we have
[TABLE]
which, in turn, implies that . This violates the -privacy condition (3).
The above condition (10) implies that for every , must contain a vector of Hamming weight at most such that . According to Definition 2, is an LRC with block-length and all-symbol locality . ∎
Theorem 1 has the following two immediate implications. First, it allows us to construct a class of LRCs using solutions to the SM-PIR-SI problem. More specifically, given a solution to the SM-PIR-SI problem with messages and side information size , one can easily obtain an LRC with block-length and locality as .
Now, consider the Partition-and-Code scheme proposed in [9] for the SM-PIR-SI problem. Let for some and . In the P&C scheme, the user first randomly partitions the messages into subsets, each of size at most , such that one of the subsets is for some . The user then asks the server to send the sum of messages in each subset, resulting in the download cost of symbols.
Note that the Partition-and-Code scheme yields a solution of size with the following form (up to column permutation):
[TABLE]
It is easy to verify that the corresponding LRC is a direct-sum of single-parity check codes, each of length at most . In other words, is a simple LRC that partitions the message symbols into subsets each of size at most , and adds a parity check symbol for each subset.
Second, Theorem 1 enables us to use (7) to obtain an upper bound on the capacity of a (scalar-linear) single-message PIR-SI scheme. As we show next, the bound coincides with the upper bound derived in [8, 9].
Corollary 1**.**
The scalar-linear capacity of the single-message PIR-SI problem is upper bounded by .
Proof:
Let be a scalar-linear solution to the SM-PIR-SI problem. Let . Suppose the minimum distance of is . Note that we must have . For, if , must contain a column of all zeros. Let denote the index of this all-zero column. However, this implies that cannot be the demand, and this will violate the privacy.666Note that here we are using the same argument as in the proof of Theorem 1 (cf. (16)). Now, since is an LRC with block-length , dimension , and locality from Theorem 1, we have from (7) that
[TABLE]
After re-arranging, and noting that and is an integer, we get
[TABLE]
As the messages are independent and uniformly distributed over , we have . The result then follows from (6). ∎
Remark 1**.**
The above result can be directly proved using an upper bound on the rate of an LRC with locality given as (see [18, Theorem 1]). It is interesting to note that [18, Theorem 1] uses an argument based on acyclic induced subgraphs similar to [8, 9].
We say that a scalar-linear solution to SM-PIR-SI problem is an optimal solution, if . Then, Proposition 1 implies the following structure on any optimal scalar-linear solution.
Corollary 2**.**
When , any optimal scalar-linear solution to the PIR-SI problem can be converted to the following form using elementary row operations and column permutations:
[TABLE]
where can be any non-zero element in , i.e., , and the number of non-zero entries in each row is exactly .
Since the solution obtained using the partition-and-code scheme (cf. (12)) has the same form as (13), this shows the uniqueness of the solution obtained by the partition-and-code scheme. In other words, any optimal scalar-linear solution can be obtained from the partition-and-code solution using elementary row operations and column permutations.
Next, we establish the relation from a parity check matrix of an LRC to a solution of the SM-PIR-SI problem.
Theorem 2**.**
Let be a parity check matrix of an LRC with block length and locality . Then, it is possible to construct a single-message PIR-SI scheme, such that the solution is a column-permutation of .
Proof:
We present a constructive proof. In the rest of the proof, we consider all sets as ordered sets (with a natural ascending order). For a given and , the user first finds a permutation on as follows. Choose an index uniformly at random from , independent of and . Let be a repair group of . If a coordinate has multiple repair groups, arbitrarily choose one repair group.777This arbitrary choice of a repair group for each coordinate is made a priori, and are known to the server as a part of the scheme. By the definition of locality, we have . For simplicity, we assume that every repair group of any symbol is of size .888The arguments can be easily generalized to the case when some repair groups are smaller than . Let be a random permutation of . Let , and be a random permutation of . Let be the permutation that maps to , to , and to . The user sends as its query . The server then applies to the columns of to obtain , i.e., for each , where is the th column of . Then, the server computes the answer as .
Next, we show that the above scheme satisfies the recoverablity and -privacy conditions. Indeed, by the definition of locality for , contains a vector whose support is . Therefore, by the construction of , contains a vector whose support is . Hence, the recoverability condition in (5) is satisfied.
For the -privacy, it suffices to show that, for any and any permutation ,
[TABLE]
This is because using (14), it is easy to show that , from which the privacy condition (3) follows.
Now, we give a proof of (14). Observe that the query generation process first maps the demand index to a random index in . Let denote that random index. Let and be random variables corresponding to (independent) uniform random permutations of and , respectively. Now, given a permutation on as a query, define the following events:
[TABLE]
Then, for any and a permutation on , the probability of choosing as a query can be written as
[TABLE]
where (a) follows from the query generation procedure, and (b) uses (1) and (2) to compute . This completes the proof of (14), and concludes the proof. ∎
III-B Multi-Message PIR-SI and Cooperative LRCs
In this section, we show that a multi-message PIR-SI scheme is a dual of a cooperative LRC, introduced in [14].
First, we show that any solution to the multi-message PIR-SI problem should be a parity check matrix of a code with cooperative locality.
Theorem 3**.**
Any scalar-linear solution to the multi-message PIR-SI problem with a demand set of size and a side information set of size must be a parity check matrix of an LRC with block length and -cooperative locality.
Proof:
First, we note that the following necessary condition is imposed by the privacy and recoverability conditions. For any query , the answer should satisfy the following necessary condition: for every candidate demand index set , , there must exist a potential side information index set , such that it is possible to recover from and . In other words, the following condition must hold:
[TABLE]
If the aforementioned necessary condition does not hold, then the server will learn from that is not the user’s demand index. Since is the solution corresponding to the query , we have
[TABLE]
which, in turn, implies that . This violates the -privacy condition (3). This violates the privacy condition (3).
The above condition (15) implies that for every subset of size , must contain vectors such that , and for each , . It is easy to verify from Definition 3 that is an cooperative LRC with block-length . ∎
Corollary 3**.**
For , the scalar-linear capacity of the multi-message PIR-SI problem is upper bounded by .
Proof:
Let . Note that from Theorem 3, must be a code with blocklength and -cooperative locality. Using (8), it is shown in [15, Corollary 1] that the rate of a code with -cooperative locality for is upper bounded as . Therefore, we have . This yields , which gives the capacity upper bound. ∎
Next, we show that it is possible to construct a solution to the multi-message PIR-SI problem using a parity check matrix of a cooperative locality code.
Theorem 4**.**
Let be a parity check matrix of an LRC with block-length and -cooperative locality. Then, it is possible to construct a multi-message PIR-SI scheme, such that the solution is a column-permutation of .
Proof:
The query generation process and the rest of the proof is similar to the proof of Theorem 1. ∎
Corollary 4**.**
For , the rate of a linear cooperative LRC is upper bounded by .
Proof:
Let be a parity check matrix of an cooperative LRC. From Theorem 3, is a solution (up to a column-permutation) of a multi-message PIR-SI problem such that , , and . Now, in [16, Lemma 1], it is shown that, when , the number of transmissions in any multi-message PIR-SI scheme is at least . Therefore, we have , from which the result follows. ∎
Remark 2**.**
Corollary 4 yields a better bound on the rate of a cooperative LRC for than [15, Corollary 1] given as . In fact, the rate bound is tight for . This is because an MDS code trivially has -cooperative locality for any .
Theorem 3 also enables us to obtain computationally efficient multi-message PIR-SI solutions. In particular, for , the schemes in [16] (see also [19]) rely on generalized Reed-Solomon codes, and thus, require a finite field size at least . On the other hand, it is possible to use constructions of cooperative LRCs to obtain PIR-SI schemes over smaller field size.999Note that small field size schemes obtained from cooperative LRCs may have smaller download rate than those in [16, 19]. As an example, an simplex code has -cooperative locality for any (see [15]). Thus, it is possible to obtain multi-message PIR-SI solutions over the binary field when for a positive integer , , and .
III-C -Private PIR-SI Schemes and MDS Codes
In this section, we show an equivalence between a solution to the -PIR-SI problem and a maximum distance separable (MDS) code.
First, we establish the relation from a solution of the -PIR-SI problem to a parity check matrix of an MDS code.
Theorem 5**.**
Any scalar-linear solution to the -PIR-SI problem must be a parity check matrix of a MDS code.
Proof:
First, we note that the -privacy condition implies the following necessary condition: for each message and every set of size , it is possible to recover from and . If this is not the case, then the server learns that the user cannot possess and demand any such that . Indeed, since is the solution corresponding to the query , we have
[TABLE]
which, in turn, implies that . This violates the -privacy condition (4).
The aforementioned necessary condition implies that, for any set of size , for every , we should have
[TABLE]
Equation (17), in turn, implies that the columns of in must be linearly independent. Since this should hold for each subset of size , we have that every subset of columns of of size are linearly independent. Thus, must be a parity check matrix of a MDS code. ∎
Next, we establish a relation from a parity check matrix of an MDS code to a solution of the -PIR-SI problem. It is worth noting that the achievability schemes in [9, 16] for -privacy are based on MDS codes.
Theorem 6**.**
Let be a parity check matrix of a -MDS code. Then, is a solution to the -PIR-SI problem.
Proof:
First, note that the scheme with is private, since the solution is independent of the particular realization of and . As the server already knows the size of the side information index set, it does not get any other information about and from .
To see the recoverability, note that any columns of are linearly independent. Thus, given the side information for any of size , the user can recover all the messages , , including the demand message(s) . ∎
IV Equivalence Results for Non-Linear Schemes
In this section, we consider generic PIR-SI schemes and LRCs, which encompass scalar-linear, vector-linear, and non-linear schemes. We begin with the definition of a generic LRC.
Definition 4**.**
An LRC is a set of vectors in of size , referred to as codewords, together with
an encoding function , which is a bijection between vectors in and codewords in , and 2. 2.
*a set of deterministic repair functions , , such that, for every coordinate , there exists a set of coordinates , satisfying for every codeword . We say that is a repair group of the -th coordinate. *
Next, for the SM-PIR-SI problem, we define a PIR-SI code. Towards this end, we introduce the following notation:
[TABLE]
That is, is the set of all possible combinations of the demand index and the side information index set.
Definition 5**.**
A PIR-SI code for is a set of vectors in , referred to as codewords, together with
a class of deterministic answer functions , where each function maps vectors from to the codewords, i.e., , 2. 2.
a class of deterministic recovery functions , where each function is from to , and 3. 3.
a stochastic query function that maps to an answer function (independently of the value of ) such that:
- (i)
for every , , , and for each ,
[TABLE]
and
- (ii)
there exists a decoding function satisfying
[TABLE]
We refer to as the length of the PIR code.
It is straightforward to show that the -privacy condition (19) implies the following necessary condition on a PIR code.
Lemma 1**.**
In a PIR-SI code, for any , for every , there must exist a decoding function and a set , , such that .
Now, we show a relation from a PIR-SI code to an LRC. It is worth noting that the proof technique is similar to [20, Lemma 3].
Theorem 7**.**
Given a PIR-SI code of length over , it is possible to construct an LRC of size (at least) .
Proof:
First, note that, for any , there must exist a vector such that . This is because every maps to . Next, for an arbitrary and the corresponding , let us define . Now, from Lemma 1, for every , there must exist a deterministic decoding function and a set , , such that . Using this, define, for every , , and . It is easy to verify that the set along with with an arbitrary bijection and repair functions is an LRC of size at least . ∎
Next, from [18, Theorem 2.1], we have the following upper bound on the size of an LRC.
Proposition 2**.**
[18]** For any LRC , the size .
We refer to an LRC satisfying the equality to be an optimal LRC.
To complete the equivalence, we establish a relation from an optimal LRC to a PIR-SI code.
Theorem 8**.**
Given an optimal LRC, it is possible to construct a PIR-SI code of length over .
In order to prove Theorem 8, we need two other lemmas. To simplify the presentation, we define . Also, for a code of block-length and a set , let denote the code obtained by puncturing on the coordinates outside of .
First, we show that any optimal LRC must contain coordinates such that values on these coordinates determine the values of the remaining coordinates. Note that for an arbitrary non-linear code, there my not exist any subset of coordinates that determine values of the remaining coordinates.
Lemma 2**.**
For an optimal LRC , there exists a partition of coordinates into sets and such that , , and for any codeword , the symbols can be recovered from the symbols .
Proof:
We iteratively construct and as follows.
Initialize
- 2.
While :
- 2.1
Choose a coordinate
- 2.2
Set , for a repair group of
- 2.3
Set .
By the construction of and , the coordinates in can be recovered from the coordinates in .
Note that, in each step, grows by one, and grows by at most as the locality of the code is . In other words, in each step, grows by at most . Therefore, the number of steps for which the while loop runs is at least . This gives .
Next, we show that . Since there is a bijection between and , and since the coordinates in are a function of those in , there must be a bijection between and . This implies that , and thus, .
We conclude that , which completes the proof. ∎
Given a vector , we define a translation of an LRC as
[TABLE]
Now, using Lemma 2, we show that there exist translations of an optimal LRC that partition .
Lemma 3**.**
For an optimal LRC , there exist distinct vectors , , such that the translations partition the space . That is,
[TABLE]
and
[TABLE]
Proof:
We give a constructive proof. Let and be the sets of coordinates of as described in Lemma 2. Without loss of generality, let be the first coordinates. Let denote the set of vectors in in a lexicographic order. For each , define , where is the all-zero vector of length .
Note that any translation of has the same size as . Thus, to prove (23), it suffices to show (22). We prove this by the way of contradiction. Suppose, for contradiction, that there exists a pair of codewords such that . This implies that
[TABLE]
Therefore, . Further, since the coordinates in can be recovered from those in (Lemma 2), we must have . However, as , we have a contradiction to (24). ∎
Proof of Theorem 8: Lemma 3 enables us to construct a PIR-SI code of length over using an optimal LRC as follows.
Answer functions: We construct a set of answer functions, and associate every answer function with a permutation on . Towards this end, we need the following additional notation. For , let denote the length- -ary expansion of . For a permutation on and a vector , let
Let be a set of vectors as described in Lemma 3. For a given and a permutation on , let be such that . Note that, by Lemma 3, the translations partition the space . Hence, there exists a unique such for every and any permutation on . Define the answer functions for every and every permutation on as
[TABLE]
Query function: We are given an index and a set . First, choose an index uniformly at random independent of and . Choose an arbitrary repair group of , say .101010If a coordinate has multiple repair groups, arbitrarily choose one repair group. This arbitrary choice of a repair group for each coordinate is made a priori, and are known to the server as a part of the scheme. Let . Let and be random permutations of sets and , respectively. Let be a permutation on the set that maps to , to , and to . Then, the query function maps to in . Note that it suffices for the user to send as their query.
Recovery functions: For a set , let denote the length- vector obtained by deleting the coordinates of outside . Now, given and , define the recovery function as
[TABLE]
where is the repair function of for the coordinate (see Definition 4).
Recoverability and Privacy: It is straightforward to verify that (cf. (26)). The -privacy condition (19) can be proven in the same way as in the proof of Theorem 2, and thus, the proof is omitted.
V Conclusion
The theoretical computer science community has established a strong relationship between PIR schemes and locally decodable codes. This paper extends this theme by establishing strong relationship between PIR schemes for a recently proposed single-server PIR with side information problem and locally recoverable codes. As corollaries to these results, we obtain upper bounds on the download rate for PIR-SI schemes, and a novel rate upper bound on cooperative LRCs.
Acknowledgement
S. Kadhe would like to thank Kannan Ramchandran for helpful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private information retrieval,” Journal of the ACM , vol. 45, no. 6, pp. 965–981, 1998.
- 2[2] S. Yekhanin, “Private information retrieval,” Communications of the ACM , vol. 53, no. 4, pp. 68–73, 2010.
- 3[3] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” Co RR , vol. abs/1602.09134, 2016. [Online]. Available: http://arxiv.org/abs/1602.09134
- 4[4] ——, “The capacity of robust private information retrieval with colluding databases,” IEEE Trans. on Info. Theory , vol. 64, no. 4, pp. 2361–2370, April 2018.
- 5[5] R. Tajeddine and S. El Rouayheb, “Robust private information retrieval on coded data,” in 2017 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2017.
- 6[6] K. Banawan and S. Ulukus, “Multi-message private information retrieval: Capacity results and near-optimal schemes,” Co RR , vol. abs/1702.01739, 2017. [Online]. Available: http://arxiv.org/abs/1702.01739
- 7[7] ——, “The capacity of private information retrieval from coded databases,” IEEE Trans. on Info. Theory , vol. 64, no. 3, pp. 1945–1956, March 2018.
- 8[8] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , Oct 2017, pp. 1099–1106.
