The Capacity of Private Information Retrieval from Heterogeneous Uncoded   Caching Databases

Karim Banawan; Batuhan Arasli; Yi-Peng Wei; Sennur Ulukus

arXiv:1902.09512·cs.IT·February 26, 2019

The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases

Karim Banawan, Batuhan Arasli, Yi-Peng Wei, Sennur Ulukus

PDF

TL;DR

This paper investigates the optimal private information retrieval strategy from multiple databases with different storage capacities, revealing that heterogeneity does not reduce the maximum achievable download efficiency.

Contribution

It characterizes the optimal PIR download cost for heterogeneous storage databases and shows it equals the homogeneous case, providing explicit placement for three databases.

Findings

01

Optimal PIR download cost matches the homogeneous case.

02

Heterogeneity in storage does not reduce PIR capacity.

03

Explicit content placement for three databases is provided.

Abstract

We consider private information retrieval (PIR) of a single file out of $K$ files from $N$ non-colluding databases with heterogeneous storage constraints $m = (m_{1}, \dots, m_{N})$ . The aim of this work is to jointly design the content placement phase and the information retrieval phase in order to minimize the download cost in the PIR phase. We characterize the optimal PIR download cost as a linear program. By analyzing the structure of the optimal solution of this linear program, we show that, surprisingly, the optimal download cost in our heterogeneous case matches its homogeneous counterpart where all databases have the same average storage constraint $μ = \frac{1}{N} \sum_{n = 1}^{N} m_{n}$ . Thus, we show that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. We provide the optimum content placement explicitly for $N = 3$ .

Tables1

Table 1. Table 1 : Explicit content assignment for N = 3 𝑁 3 N=3 ( m 1 ≥ m 2 ≥ m 3 subscript 𝑚 1 subscript 𝑚 2 subscript 𝑚 3 m_{1}\geq m_{2}\geq m_{3} without loss of generality).

Case

Assignment

1 \leq m_{s} \leq 2

m_{1} + m_{2} \geq 1

m_{1} + m_{3} \geq 1

m_{2} + m_{3} \geq 1

α_{1} = 2 - m_{s}

α_{2} = α_{3} = 0

α_{12} = m_{1} + m_{2} - 1

α_{13} = m_{1} + m_{3} - 1

α_{23} = 1 - m_{1}

α_{123} = 0

1 \leq m_{s} \leq 2

m_{1} + m_{2} \geq 1

m_{1} + m_{3} \geq 1

m_{2} + m_{3} \leq 1

α_{1} = 2 - m_{s}

α_{2} = α_{3} = 0

α_{12} = m_{1} + m_{2} - 1

α_{13} = m_{1} + m_{3} - 1

α_{23} = 1 - m_{1}

α_{123} = 0

1 \leq m_{s} \leq 2

m_{1} + m_{2} \geq 1

m_{1} + m_{3} \leq 1

m_{2} + m_{3} \leq 1

α_{1} = 1 - (m_{2} + m_{3})

α_{2} = 1 - (m_{1} + m_{3})

α_{3} = m_{3}

α_{12} = m_{s} - 1

α_{13} = α_{23} = 0

α_{123} = 0

1 \leq m_{s} \leq 2

m_{1} + m_{2} \leq 1

m_{1} + m_{3} \leq 1

m_{2} + m_{3} \leq 1

α_{1} = 1 - (m_{2} + m_{3})

α_{2} = 1 - (m_{1} + m_{3})

α_{3} = m_{3}

α_{12} = m_{s} - 1

α_{13} = α_{23} = 0

α_{123} = 0

2 \leq m_{s} \leq 3

α_{1} = α_{2} = α_{3} = 0

α_{12} = 1 - m_{3}

α_{13} = 1 - m_{2}

α_{23} = 1 - m_{1}

α_{123} = m_{s} - 2

Equations184

H (W_{1}, \dots, W_{K}) = K L, H (W_{k}) = L, k \in [K]

H (W_{1}, \dots, W_{K}) = K L, H (W_{k}) = L, k \in [K]

H (Z_{n}) \leq m_{n} K L, n \in [N]

H (Z_{n}) \leq m_{n} K L, n \in [N]

W_{k} = S \subseteq [N] ⋃ W_{k, S}

W_{k} = S \subseteq [N] ⋃ W_{k, S}

1 = \frac{1}{K L} k = 1 \sum K H (W_{k}) = \frac{1}{K L} k = 1 \sum K S \subseteq [N] \sum H (W_{k, S}) = S \subseteq [N] \sum α_{S}

1 = \frac{1}{K L} k = 1 \sum K H (W_{k}) = \frac{1}{K L} k = 1 \sum K S \subseteq [N] \sum H (W_{k, S}) = S \subseteq [N] \sum α_{S}

m_{n} \geq \frac{1}{K L} H (Z_{n}) = S \subseteq [N], n \in S \sum α_{S}, n \in [N]

m_{n} \geq \frac{1}{K L} H (Z_{n}) = S \subseteq [N], n \in S \sum α_{S}, n \in [N]

I (W_{1 : K}; Q_{1 : N}^{[θ]}) = 0

I (W_{1 : K}; Q_{1 : N}^{[θ]}) = 0

H (A_{n}^{[θ]} ∣ Q_{n}^{[θ]}, Z_{n}) = 0, n \in [N]

H (A_{n}^{[θ]} ∣ Q_{n}^{[θ]}, Z_{n}) = 0, n \in [N]

(Q_{n}^{[θ]}, A_{n}^{[θ]}, W_{1 : K}) \sim (Q_{n}^{[θ^{'}]}, A_{n}^{[θ^{'}]}, W_{1 : K}), θ, θ^{'} \in [K]

(Q_{n}^{[θ]}, A_{n}^{[θ]}, W_{1 : K}) \sim (Q_{n}^{[θ^{'}]}, A_{n}^{[θ^{'}]}, W_{1 : K}), θ, θ^{'} \in [K]

H (W_{θ} ∣ Q_{1 : N}^{[θ]}, A_{1 : N}^{[θ]}) = o (L)

H (W_{θ} ∣ Q_{1 : N}^{[θ]}, A_{1 : N}^{[θ]}) = o (L)

D = n = 1 \sum N H (A_{n}^{[θ]})

D = n = 1 \sum N H (A_{n}^{[θ]})

α_{S} \geq 0 min

α_{S} \geq 0 min

S : ∣ S ∣ \geq 1 \sum α_{S} = 1

S : n \in S \sum α_{S} \leq m_{n}, n \in [N]

D \geq

D \geq

D \geq

D \geq

+ \frac{17}{54} i = 1 \sum 3 k = 1 \sum 3 ∣ W_{k, {i}} ∣ L + o (L)

=

+ \frac{4}{27} S \subseteq [1 : 3] ∣ S ∣ = 3 \sum k = 1 \sum 3 ∣ W_{k, S} ∣ L + o (L)

D^{*} \geq

D^{*} \geq

=

α_{S} \geq 0 min

α_{S} \geq 0 min

α_{1} + α_{2} + α_{3} + α_{12} + α_{13} + α_{23} + α_{123} = 1

α_{1} + α_{12} + α_{13} + α_{123} \leq m_{1}

α_{2} + α_{12} + α_{23} + α_{123} \leq m_{2}

α_{3} + α_{13} + α_{23} + α_{123} \leq m_{3}

K (∣ W_{k, 1} ∣ + ∣ W_{k, 2} ∣ + ∣ W_{k, 3} ∣) L = 3 (α_{1} + α_{2} + α_{3}) L

K (∣ W_{k, 1} ∣ + ∣ W_{k, 2} ∣ + ∣ W_{k, 3} ∣) L = 3 (α_{1} + α_{2} + α_{3}) L

(1 + \frac{1}{2} + \frac{1}{2 ^{2}}) (∣ W_{k, 12} ∣

(1 + \frac{1}{2} + \frac{1}{2 ^{2}}) (∣ W_{k, 12} ∣

(1 + \frac{1}{3} + \frac{1}{3 ^{2}}) ∣ W_{k, 123} ∣ L = \frac{13}{9} α_{123} L

(1 + \frac{1}{3} + \frac{1}{3 ^{2}}) ∣ W_{k, 123} ∣ L = \frac{13}{9} α_{123} L

\frac{D}{L} = 3 (α_{1} + α_{2} + α_{3}) + \frac{7}{4} (α_{12} + α_{13} + α_{23}) + \frac{13}{9} α_{123}

\frac{D}{L} = 3 (α_{1} + α_{2} + α_{3}) + \frac{7}{4} (α_{12} + α_{13} + α_{23}) + \frac{13}{9} α_{123}

β_{1}

β_{1}

β_{2}

β_{3}

β_{i} \geq 0 min

β_{i} \geq 0 min

β_{1} + β_{2} + β_{3} = 1

β_{1} + 2 β_{2} + 3 β_{3} \leq m_{s}

β_{2}, β_{3} \geq 0 min

β_{2}, β_{3} \geq 0 min

β_{2} + β_{3} \leq 1

β_{2} + 2 β_{3} \leq m_{s} - 1

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases††thanks: This work was supported by NSF Grants CNS 15-26608, CCF 17-13977 and ECCS 18-07348. A shorter version is submitted to IEEE ISIT 2019.

Karim Banawan Batuhan Arasli Yi-Peng Wei Sennur Ulukus

Department of Electrical and Computer Engineering

University of Maryland

College Park

MD 20742

[email protected] [email protected] [email protected] [email protected]

Abstract

We consider private information retrieval (PIR) of a single file out of $K$ files from $N$ non-colluding databases with heterogeneous storage constraints $\bm{m}=(m_{1},\cdots,m_{N})$ . The aim of this work is to jointly design the content placement phase and the information retrieval phase in order to minimize the download cost in the PIR phase. We characterize the optimal PIR download cost as a linear program. By analyzing the structure of the optimal solution of this linear program, we show that, surprisingly, the optimal download cost in our heterogeneous case matches its homogeneous counterpart where all databases have the same average storage constraint $\mu=\frac{1}{N}\sum_{n=1}^{N}m_{n}$ . Thus, we show that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. We provide the optimum content placement explicitly for $N=3$ .

1 Introduction

The problem of private information retrieval (PIR), introduced in [1], has attracted much interest in the information theory community with leading efforts [2, 3, 4, 5, 6]. In the classical setting of PIR, a user wants to retrieve a file out of $K$ files from $N$ databases, each storing the same content of entire $K$ files, such that no individual database can identify the identity of the desired file. Sun and Jafar [7] characterized the optimal normalized download cost of the classical setting to be $D^{*}=1+\frac{1}{N}+\cdots+\frac{1}{N^{K-1}}$ . Fundamental limits of many interesting variants of the PIR problem have been investigated in [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53].

A common assumption in most of these works is that the databases have sufficiently large storage space that can accommodate all $K$ files in a replicated manner. This may not be the case for peer-to-peer (P2P) and device-to-device (D2D) networks, where information retrieval takes place directly between the users. Here, the user devices (databases) will have limited and heterogeneous sizes. This motivates the investigation of PIR from databases with heterogeneous storage constraints. In this work, we aim to jointly design the storage mechanism (content placement) and the information retrieval scheme such that the normalized PIR download cost is minimized in the retrieval phase.

Reference [36] studies PIR from homogeneous storage-limited databases. In [36], each database has the same limited storage space of $\mu KL$ bits with $0\leq\mu\leq 1$ , where $L$ is the message size (note, perfect replication would have required $\mu=1$ ). The goal of [36] is to find the optimal centralized uncoded caching scheme (content placement) that minimizes the PIR download cost. [36] shows that symmetric batch caching scheme of [54] for content placement together with Sun-Jafar scheme in [7] for information retrieval result in the lowest normalized download cost. [36] characterizes the optimal storage-download cost trade-off as the lower convex hull of $N$ pairs $(\frac{t}{N},1+\frac{1}{t}+\cdots+\frac{1}{t^{K-1}})$ , $t=1,\cdots,N$ .

Meanwhile, the content assignment problem for heterogeneous databases (caches) is investigated in the context of coded caching in [55]. In the coded caching problem [54], the aim is to jointly design the placement and delivery phases in order to minimize the traffic load in the delivery phase during peak hours. Reference [55] proposes an optimization framework where placement and delivery schemes are optimized by solving a linear program. Using this optimization framework, [55] investigates the effects of heterogeneity in cache sizes on the delivery load memory trade-off with uncoded placement.

In this paper, we investigate PIR from databases with heterogeneous storage sizes (see Fig. 1). The $n$ th database can accommodate $m_{n}KL$ bits, i.e., the storage system is constrained by the storage size vector $\bm{m}=(m_{1},\cdots,m_{N})$ . We aim to characterize the optimal normalized PIR download cost of this problem, and the corresponding optimal placement and optimal retrieval schemes. We focus on uncoded placement as in [36] and [55].

Motivated by [55], we first show that the optimal normalized download cost is characterized by a linear program. For the achievability, each message is partitioned into $2^{N}-1$ partitions (the size of the power set of $[N]$ , denoted $\mathcal{P}([N])$ ). For every partition, we apply the Sun-Jafar scheme [7]. The linear program arises as a consequence of optimizing the achievable download cost with respect to the partition sizes subject to the storage constraints. For the converse, we slightly modify the converse in [36] to be valid for the heterogeneous case. These achievability and converse proofs result in exactly the same linear program, yielding the exact capacity for this PIR problem for all $K$ , $N$ , $\bm{m}$ . Interestingly, this is unlike the caching problem in [55] with no privacy requirements, where the linear program is only an achievability, and is shown to be the exact capacity only in special cases.

By studying the properties of the solution of the linear program, we show that, surprisingly, the optimal normalized download cost for the heterogeneous problem is identical to the optimal normalized download cost for the corresponding homogeneous problem, where the homogeneous storage constraint is $\mu=\frac{1}{N}\sum_{n=1}^{N}m_{n}$ for all databases. This implies that there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases. In fact, the PIR capacity depends only on the sum of the storage spaces and does not depend on how the storage spaces are distributed among the databases. The general proof for this intriguing result is a consequence of an existence proof for a positive linear combination using the theory of positive linear dependence in [56] (and using Farkas’ lemma [57] as a special case) for the constraint set of the linear program. As a byproduct of the structural results, we show that, for the optimal content assignment, at most two consecutive types of message partitioning exist, i.e., message $W_{k}$ should be partitioned such that there are repeated partitions over $i$ databases and at most one more repeated partitions over $i+1$ databases for some $i$ , where $i\in\{1,\cdots,N\}$ . While for general $N$ we show the existence of an optimal content placement that attains the homogeneous PIR capacity, for $N=3$ , we provide an explicit (parametric in $\bm{m}$ ) optimal content placement.

2 System Model

We consider PIR from databases with heterogeneous sizes; see Fig. 1. We consider a storage system with $K$ i.i.d. messages (files). The $k$ th message is of length $L$ bits, i.e.,

[TABLE]

The storage system consists of $N$ non-colluding databases. The storage size of the $n$ th database is limited to $m_{n}KL$ bits, for some $0\leq m_{n}\leq 1$ . Specifically, we denote the contents of the $n$ th database by $Z_{n}$ , such that,

[TABLE]

The system operates in two phases: In the placement phase, the data center (content generator) stores the message set in the $N$ databases, in such a way to minimize the download cost in the retrieval phase subject to the heterogeneous storage constraints. The placement is done in a centralized fashion [54]. The user (retriever) has no access to the data center. Here, we focus on uncoded placement as in [36, 55], i.e., file $W_{k}$ can be partitioned as,

[TABLE]

where $W_{k,{\mathcal{S}}}$ is the set of $W_{k}$ bits that appear in the database set ${\mathcal{S}}\subseteq\mathcal{P}([N])$ , where $\mathcal{P}(\cdot)$ is the power set. $H(W_{k,{\mathcal{S}}})=|W_{k,{\mathcal{S}}}|L$ , where $0\leq|W_{k,{\mathcal{S}}}|\leq 1$ . Under an uncoded placement, we have the following message size constraint,

[TABLE]

where $\alpha_{\mathcal{S}}=\frac{1}{K}\sum_{k=1}^{K}|W_{k,{\mathcal{S}}}|$ . In addition, we have the individual database storage constraints,

[TABLE]

In the retrieval phase, the user is interested in retrieving $W_{\theta}$ , $\theta\in[K]$ privately. The user submits a query $Q_{n}^{[\theta]}$ to the $n$ th database. Since the user has no information about the files, the messages and queries are statistically independent, i.e.,

[TABLE]

The $n$ th database responds with an answer string, which is a function of the received query and the stored content, i.e.,

[TABLE]

To ensure privacy, the query submitted to the $n$ th database when intended to retrieve $W_{\theta}$ should be statistically indistinguishable from the one when intended to retrieve $W_{\theta^{\prime}}$ , i.e.,

[TABLE]

where $\sim$ denotes statistical equivalence.

The user needs to decode the desired message $W_{\theta}$ reliably from the received answer strings, consequently,

[TABLE]

where $\frac{o(L)}{L}\rightarrow 0$ as $L\rightarrow\infty$ .

An achievable PIR scheme satisfies constraints (8) and (9) for some file size $L$ . The download cost $D$ is the size of the total downloaded bits from all databases,

[TABLE]

For a given storage constraint vector $\bm{m}$ , we aim to jointly design the placement phase (i.e., $Z_{n}$ , $n\in[N]$ ) and the retrieval scheme to minimize the normalized download cost $D^{*}=\frac{D}{L}$ in the retrieval phase.

3 Main Results

Theorem 1 characterizes the optimal download cost under heterogeneous storage constraints in terms of a linear program. The main ingredients of the proof of Theorem 1 are introduced in Section 4 for $N=3$ , and the complete proof is given in Section 5 for general $N$ .

Theorem 1

For PIR from databases with heterogeneous storage sizes $\bm{m}=(m_{1},\cdots,m_{N})$ , the optimal normalized download cost is the solution of the following linear program,

[TABLE]

where ${\mathcal{S}}\in\mathcal{P}([N])$ .

Theorem 2 shows the equivalence between the optimum download costs of the heterogeneous and homogeneous problems. The proof of Theorem 2 is given in Section 6.

Theorem 2

The normalized download cost of the PIR problem with heterogeneous storage sizes $\bm{m}=(m_{1},\cdots,m_{N})$ is equal to the normalized download cost of the PIR problem with homogeneous storage sizes $\mu=\frac{1}{N}\sum_{n=1}^{N}m_{n}$ for all databases, i.e., $D^{*}(\bm{m})=D^{*}(\bar{\bm{m}})$ , where $\bar{\bm{m}}$ is such that $\bar{m}_{n}=\mu$ , for $n=1,\cdots,N$ .

Remark 1

Theorem 2 implies that the storage size asymmetry does not hurt the PIR capacity, so long as the placement phase is optimized. This is unlike, for instance, access asymmetry in the case of replicated databases [37]. This is also unlike, as another instance, non-optimized content placement even for symmetric database sizes [53].

Remark 2

Stronger than what is stated, i.e., the equivalence between heterogeneous and homogeneous storage cases, Theorem 2 in fact implies that the optimal download cost in (1) depends only on the sum storage space $\sum_{n=1}^{N}m_{n}$ . Thus, any distribution of storage space within the given sum storage space yields the same PIR capacity. In particular, a uniform distribution (the corresponding homogeneous case) has the same PIR capacity. Hence, there is no loss in the PIR capacity due to heterogeneity of storage spaces of the databases.

4 Representative Example: $N=3$

We introduce the main ingredients of the achievability and converse proofs using the example of $N=3$ databases. Without loss of generality, we take $K=3$ in this section.

4.1 Converse Proof

We note that [36, Theorem 1] can be applied to any storage constrained PIR problem with arbitrary storage $Z_{1:N}$ . Hence, specializing to the case of $N=3$ (and $K=3$ ) with i.i.d. messages and uncoded content leads to [36, eqn. (39)],

[TABLE]

Using the uncoded storage assumption in (3), we can further lower bound (12) as,

[TABLE]

Normalizing with $L$ , taking the limit $L\rightarrow\infty$ , and using the definition $\alpha_{\mathcal{S}}=\frac{1}{K}\sum_{k=1}^{K}|W_{k,{\mathcal{S}}}|$ lead to the following lower bound on the normalized download cost $D^{*}$ ,

[TABLE]

where (16) follows from the message size constraint (4).

We further lower bound (16) by minimizing the right hand side with respect to $\{\alpha_{\mathcal{S}}\}_{{\mathcal{S}}\subseteq[3]}$ under storage constraints. Thus, the solution of the following linear program serves as a lower bound (converse) for the normalized download cost,

[TABLE]

where variables $\{\alpha_{\mathcal{S}}\}_{|{\mathcal{S}}|=1}$ are $\{\alpha_{1},\alpha_{2},\alpha_{3}\}$ , which represent the content stored in databases 1, 2 and 3 exclusively; variables $\{\alpha_{\mathcal{S}}\}_{|{\mathcal{S}}|=2}$ are $\{\alpha_{12},\alpha_{13},\alpha_{23}\}$ , which represent the content stored in databases 1 and 2, 1 and 3, and 2 and 3, respectively; and variable $\{\alpha_{\mathcal{S}}\}_{|{\mathcal{S}}|=3}$ is $\{\alpha_{123}\}$ , which represents the content stored in all three databases simultaneously.

Next, we show that the lower bound expressed as a linear program in (4.1) can be achieved.

4.2 Achievability Proof

In the placement phase, let $|W_{k,{\mathcal{S}}}|=\alpha_{\mathcal{S}}$ for all $k\in[K]$ . Assign the partition $W_{k,{\mathcal{S}}}$ to the set ${\mathcal{S}}$ of the databases for all $k\in[K]$ . To retrieve $W_{\theta}$ privately, $\theta\in[K]$ , the user applies the Sun-Jafar scheme [7] over the partitions of the files.

The partitions $W_{k,1}$ , $W_{k,2}$ , $W_{k,3}$ are placed in a single database each. Thus, we apply [7] with $N=1$ , and download

[TABLE]

The partitions $W_{k,12}$ , $W_{k,13}$ , $W_{k,23}$ are placed in two databases each. Thus, we apply [7] with $N=2$ , and download

[TABLE]

Finally, the partition $W_{k,123}$ is placed in all three databases. Thus, we apply [7] with $N=3$ , and download

[TABLE]

Concatenating the downloads, file $W_{\theta}$ is reliably decodable. Hence, by summing up the download costs in (18), (19) and (20), we have the following normalized download cost,

[TABLE]

which matches the lower bound in (4.1) and is subject to the same constraints. Hence, the solution to the linear program in (4.1) is achievable, and gives the exact PIR capacity.

4.3 Explicit Storage Assignment

In this section, we solve the linear program in (4.1) to find the optimal storage assignment explicitly for $N=3$ . To that end, we denote $\beta_{\ell}=\sum_{{\mathcal{S}}:|{\mathcal{S}}|=\ell}\alpha_{\mathcal{S}}$ , i.e.,

[TABLE]

We first construct a relaxed optimization problem by summing up the three individual storage constraints in (4.1) into a single constraint. The relaxed problem is,

[TABLE]

where we define the sum storage space $m_{s}=m_{1}+m_{2}+m_{3}$ . Plugging $\beta_{1}=1-\beta_{2}-\beta_{3}$ ,

[TABLE]

Since (4.3) is a linear program, the solution lies at the boundary of the feasible set. We have three cases depending on the sum storage space $m_{s}$ .

Regime 1:

When $m_{s}<1$ : In this case, the second constraint in (4.3) requires $\beta_{2}+2\beta_{3}<0$ , while we must have $\beta_{2},\beta_{3}\geq 0$ . Hence, there is no feasible solution for the relaxed problem and thus the original problem (4.1) is infeasible as well.

Regime 2:

When $1\leq m_{s}\leq 2$ : In this case, the constraint $\beta_{2}+\beta_{3}\leq 1$ is not binding. Hence, the solution satisfies the second constraint with equality, $\beta_{2}+2\beta_{3}=m_{s}-1$ , which is non-negative in this regime. Thus, (4.3) can be written in an unconstrained manner as,

[TABLE]

The optimal solution for (27) is $\beta_{3}^{*}=0$ and therefore $\beta_{2}^{*}=m_{s}-1$ . From the equality constraint $\beta_{1}+\beta_{2}+\beta_{3}=1$ , we have $\beta_{1}^{*}=2-m_{s}$ . Next, we map the solution of the relaxed problem in (4.3) to a feasible solution in the original problem in (4.1). From (24), $a_{123}^{*}=\beta_{3}^{*}=0$ . Thus, at the boundary of the inequality set of (4.1), we have,

[TABLE]

Depending on the sign of $1-(m_{j}+m_{k})$ , where $j,k\in\{1,2,3\}$ , we have different content assignments. The common structure of (28)-(30) is $\alpha_{i}-\alpha_{jk}=1-(m_{j}+m_{k})$ . We assign $\alpha_{i}=\alpha_{jk}+1-(m_{j}+m_{k})$ if $m_{j}+m_{k}\leq 1$ and $\alpha_{jk}=\alpha_{i}-1+(m_{j}+m_{k})$ otherwise. This ensures that $\alpha_{\mathcal{S}}\geq 0$ for all ${\mathcal{S}}\subseteq[1:3]$ . Using these assignments, we have sub-cases depending on the sign of $1-(m_{j}+m_{k})$ . We summarize explicit content assignment for these cases in Table 1, where we take $m_{1}\geq m_{2}\geq m_{3}$ without loss of generality, to reduce the number of cases to enumerate. With these solutions, the optimal normalized download cost in this regime is,

[TABLE]

where $\mu=\frac{m_{1}+m_{2}+m_{3}}{3}=\frac{m_{s}}{3}$ corresponds to the average storage size.

Regime 3:

When $2\leq m_{s}\leq 3$ : In this case, the solution of (4.3) is at the intersection of the constraints $\beta_{2}+\beta_{3}=1$ and $\beta_{2}+2\beta_{3}=m_{s}-1$ . Hence, we have $\beta_{2}^{*}=3-m_{s}$ and $\beta_{3}^{*}=m_{s}-2$ , which are both non-negative in this regime. From the equality constraint $\beta_{1}+\beta_{2}+\beta_{3}=1$ , we have $\beta_{1}^{*}=0$ . Next, we map the solution of the relaxed problem in (4.3) to a feasible solution in the original problem in (4.1). From (22), $\beta_{1}^{*}=0$ implies $\alpha_{1}^{*}=\alpha_{2}^{*}=\alpha_{3}^{*}=0$ . From (24), $\beta_{3}^{*}=m_{s}-2$ implies $\alpha_{123}^{*}=m_{s}-2$ . At the boundary of the feasible set of (4.1), we have,

[TABLE]

Plugging $\beta_{2}^{*}+\beta_{3}^{*}=1$ and $\alpha_{i}^{*}=0$ for $i\in\{1,2,3\}$ leads to the following content assignment,

[TABLE]

With these solutions, the optimal normalized download cost in this regime is,

[TABLE]

This solution is also shown in Table 1.

5 Optimal Download Cost for the General Problem

In this section, we give the proof of Theorem 1, i.e., show the achievability and the converse proofs for the PIR problem with heterogeneous databases, for general $N$ , $K$ , $\bm{m}$ .

5.1 General Achievability Proof

In this section, we show the achievability for general $N$ databases and $K$ messages. Let $\tilde{D}_{\ell}$ denote the optimal normalized download cost for the PIR problem with $\ell$ replicated databases [7] storing the same $K$ messages, which is achieved using Sun-Jafar scheme [7],

[TABLE]

We partition the messages over all subsets of $[1:N]$ , such that $|W_{k,{\mathcal{S}}}|=\alpha_{\mathcal{S}}$ for all $k\in[1:K]$ . Using this partitioning, the subsets ${\mathcal{S}}$ such that $|{\mathcal{S}}|=1$ correspond to a PIR problem with 1 database and $K$ messages. Hence, by applying the trivial scheme of downloading all these partitions, we download $\tilde{D}_{1}|W_{k,{\mathcal{S}}}|L=K\alpha_{\mathcal{S}}L$ bits. For the subsets ${\mathcal{S}}$ such that $|{\mathcal{S}}|=2$ , we have a PIR problem with $2$ databases and $K$ messages. Therefore, by applying Sun-Jafar scheme [7], we download $\tilde{D}_{2}|W_{k,{\mathcal{S}}}|L=(1+\frac{1}{2}+\cdots+\frac{1}{2^{K-1}})\alpha_{\mathcal{S}}L$ bits, and so on. This results in total normalized download cost of $\sum_{\ell=1}^{N}\sum_{{\mathcal{S}}:|{\mathcal{S}}|=\ell}\alpha_{\mathcal{S}}\tilde{D}_{\ell}$ . The optimal content assignment is obtained by optimizing over $\{\alpha_{\mathcal{S}}\}_{{\mathcal{S}}:|{\mathcal{S}}|\geq 1}$ subject to the message size constraint (4), and the individual storage constraints (5). Thus, the achievable normalized download can be written as the following linear program,

[TABLE]

where ${\mathcal{S}}\in\mathcal{P}([1:N])$ .

5.2 General Converse Proof

In this section, we show the converse for general $N$ databases and $K$ messages. The result in [36, Theorem 1] gives a general lower bound for a PIR system with $N$ databases and $K$ messages and arbitrary storage contents $Z_{1:N}$ as

[TABLE]

where $\lambda(n,k)$ is given by,

[TABLE]

For uncoded placement, we have,

[TABLE]

The simplifications in [36], which are intended to deal with the nested harmonic sum, can be applied to the heterogeneous storage as well. Thus, the following lower bound in [36, (77)] is a valid lower bound for the normalized download cost for the heterogeneous problem,

[TABLE]

where

[TABLE]

Substituting (43) in (42) leads to,

[TABLE]

where the last step follows from the message size constraint.

This settles Theorem 1 by having shown that both achievability and converse proofs result in the same linear program which is given in (1).

6 Equivalence to the Homogeneous Problem

We prove Theorem 2, which implies an equivalence between the solution of (1) with heterogeneous storage constraints $\bm{m}$ and the solution of (1) with homogeneous storage constraint $\mu=\frac{1}{N}\sum_{n=1}^{N}m_{n}$ for all databases. To that end, let $\beta_{n}=\sum_{{\mathcal{S}}:|{\mathcal{S}}|=n}\alpha_{\mathcal{S}}$ as before. By adding the individual storage size constraints in (1), we write the following relaxed problem,

[TABLE]

where $m_{s}=\sum_{n=1}^{N}m_{n}$ , as before, is the sum storage space and $\tilde{D}_{n}$ is defined in (37). The solution of the relaxed problem is potentially lower than (1), since the optimal solution of (1) is feasible in (6). Note that the relaxed problem (6) depends only on the sum storage space $m_{s}$ and the number of databases $N$ . Therefore, the corresponding relaxed problem is the same for all distributions of the storage space among databases under the same $m_{s}$ , including the uniform distribution which results in the homogeneous problem. Thus, in order to show the equivalence of the heterogeneous and homogeneous problems, it suffices to prove that the optimal solution of (6) can be mapped back to a feasible solution of (1).

We write the Lagrangian function corresponding to (6) as,

[TABLE]

The optimality conditions are,

[TABLE]

We have the following structural insights about the relaxed problem. The first lemma states that, in the optimal solution, there are at most two non-zero $\beta$ s.

Lemma 1

There does not exist a subset $\mathcal{N}$ , such that $|\mathcal{N}|\geq 3$ and $\beta_{n}>0$ for all $n\in\mathcal{N}$ .

**Proof: ** Assume for sake of contradiction that there exists $\mathcal{N}$ such that $|\mathcal{N}|\geq 3$ . Hence, $\mu_{n}=0$ for all $n\in\mathcal{N}$ . From the optimality conditions in (49), we have,

[TABLE]

This results in $|\mathcal{N}|$ independent equations in 2 unknowns ( $\gamma$ and $\lambda$ ), which is an inconsistent linear system if $|\mathcal{N}|\geq 3$ . Thus, we have a contradiction, and $|\mathcal{N}|$ can be at most 2. $\blacksquare$

The second lemma states that if two $\beta$ s are positive, then they must be consecutive.

Lemma 2

If $\beta_{n_{1}}>0$ , and $\beta_{n_{2}}>0$ , then $n_{2}=n_{1}+1$ .

**Proof: ** Assume for sake of contradiction that $\beta_{n_{1}}>0$ , $\beta_{n_{2}}>0$ , such that $n_{2}=n_{1}+2$ , and that $\beta_{n_{0}}=0$ where $n_{0}=n_{1}+1$ . Then, from the optimality conditions, we have,

[TABLE]

Solving for $\mu_{n_{0}}$ leads to,

[TABLE]

Since $D_{n}$ is convex in $n$ , we have $\tilde{D}_{n_{0}}\leq\frac{1}{2}(\tilde{D}_{n_{1}}+\tilde{D}_{n_{2}})$ , which implies $\mu_{n_{0}}\leq 0$ , which is impossible since Lagrange multiplier $\mu_{n_{0}}\geq 0$ , and from Lemma 1, $\mu_{n_{0}}\neq 0$ . Thus, we have a contradiction, and we cannot have a zero $\beta$ between two non-zero $\beta$ s. $\blacksquare$

The third lemma states that having $m_{s}$ an integer leads to activating a single $\beta$ only.

Lemma 3

$\beta_{j}=1$ * and $\beta_{n}=0$ for all $n\neq j$ if and only if $m_{s}=j<N$ , where $j\in\mathbb{N}$ .*

**Proof: ** From the optimality conditions, we have,

[TABLE]

Substituting $\gamma$ from (55) into (56) leads to,

[TABLE]

Since $j<N$ , we can choose an $n>j$ . Then, (57) implies,

[TABLE]

Since $\tilde{D}_{n}$ is monotonically decreasing in $n$ , we have $\lambda\geq c>0$ for some positive constant $c=\frac{\tilde{D}_{j}-\tilde{D}_{n}}{n-j}$ . Since $\lambda>0$ , the inequality $\sum_{n=1}^{N}n\beta_{n}\leq m_{s}$ must be satisfied with equality. To have a feasible solution for the two equations $\sum_{n=1}^{N}\beta_{n}=1$ and $\sum_{n=1}^{N}n\beta_{n}=m_{s}$ , we must have $m_{s}=j$ and $\beta_{j}=1$ . $\blacksquare$

The fourth lemma gives the solution of the relaxed problem for non-integer $m_{s}$ .

Lemma 4

For the relaxed problem (6), if $j-1<m_{s}<j$ , then $\beta_{j-1}^{*}=j-m_{s}$ and $\beta_{j}^{*}=m_{s}-(j-1)$ .

**Proof: ** From Lemma 1, at most two $\beta$ s should be positive. From Lemma 3, exactly two $\beta$ s should be positive, as $m_{s}$ is not an integer here. From Lemma 2, the positive $\beta$ should be consecutive, and because of continuity, we must have $\beta_{j-1}>0$ and $\beta_{j}>0$ . Thus, on the boundary, we have,

[TABLE]

Solving these equations simultaneously results in $\beta_{j-1}^{*}=j-m_{s}$ and $\beta_{j}^{*}=m_{s}-(j-1)$ . $\blacksquare$

Thus, Lemmas 1-4 establish the structure of the relaxed problem: First, since $0\leq m_{n}\leq 1$ for all $n$ , we have $0\leq m_{s}\leq N$ . If $0\leq m_{s}<1$ , then there is no PIR possible. If $m_{s}$ is an integer between 1 and $N$ , then only one $\beta$ is positive and it is equal to 1. For instance, if $m_{s}=j$ , then $\beta_{j}=1$ . In this case, only one type of $\alpha$ with $j$ subscripts is positive. If $m_{s}$ is a non-integer between 1 and $N$ , then two $\beta$ s are positive. For instance, if $j-1<m_{s}<j$ , then $\beta_{j-1}$ and $\beta_{j}$ are positive and equal to $j-m_{s}$ and $m_{s}+1-j$ , respectively. In this case, two types of $\alpha$ s with $j-1$ and $j$ subscripts are positive.

Finally, to show the equivalence of the original linear program in (1) and the relaxed linear problem in (6), we need to show that a feasible (non-negative) solution of (1) exists for every optimal solution of (6). That is, the optimal $\beta$ s found in solving (6) can be mapped to a set of feasible $\alpha$ s in (1). We note that, we have shown this by finding an explicit solution for the case of $N=3$ in Section 4.3. We give an alternative proof for the case of $N=4$ using Farkas’ lemma [57] in Appendix A. In the following lemma, we give the proof for general $N$ by using the theory of positive linear dependence in [56].

Lemma 5

There exists a feasible (non-negative) solution of (1) corresponding to the optimal solution of the relaxed problem in (6).

**Proof: ** Since the inequality in the constraint set of the relaxed problem (6) is satisfied with equality, the $N$ inequalities in the constraint set of the original problem (1) should be satisfied with equality as well. We know from Lemmas 1-4 that only two $\beta$ s will be positive, therefore, their expressions in terms of the corresponding $\alpha$ s will give two more equations. Assuming that $i<m_{s}<i+1$ , we have $\beta_{i}^{*}=i+1-m_{s}$ and $\beta_{i+1}^{*}=m_{s}-i$ ; $\beta_{i}$ is a sum of ${N\choose i}$ $\alpha$ s and $\beta_{i+1}$ is a sum of ${N\choose i+1}$ $\alpha$ s. Thus, we have $(N+2)$ equations in ${N\choose i}+{N\choose i+1}$ variables; and, we need to show that a feasible solution to these linear equations exists.

We denote this linear system of equations as $\bm{A}\bm{\alpha}=\bm{b}$ where $\bm{\alpha}$ is the vector of $\alpha_{{\mathcal{S}}}$ , i.e., content assignments, and $\bm{b}$ is the vector of $m_{i}$ and $\beta_{i}$ , i.e., storage constraints and relaxed problem coefficients, i.e.,

[TABLE]

where

[TABLE]

and

[TABLE]

Now, $\bm{A}$ , an $(N+2)\times\left({N\choose i}+{N\choose i+1}\right)$ matrix of zeros and ones, has the following properties:

Every column of the matrix is unique. 2. 2.

First ${N\choose i}$ columns have $i$ 1s and $N-i$ 0s in their first $N$ rows. Last two elements of these columns are all 1s and all 0s, respectively. 3. 3.

The remaining ${N\choose i+1}$ columns have $i+1$ 1s and $N-i-1$ 0s in their first $N$ rows. Last two elements of these columns are all 0s and all 1s, respectively. 4. 4.

First three properties imply that, in the first $N$ rows of the matrix, every permutation of $i$ 1s and $N-i$ 0s exist in the first ${N\choose i}$ columns; and every permutation of $i+1$ 1s and $N-i-1$ 0s exist in the next ${N\choose i+1}$ columns.

To clarify the setting with an example, consider $N=4$ and $1<m_{s}<2$ . In this case, we have $\beta_{1}^{*}=2-m_{s}$ and $\beta_{2}^{*}=m_{s}-1$ . Corresponding to $\beta_{1}$ , we have ${4\choose 1}=4$ $\alpha$ s, which are $\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4}$ which sum to $\beta_{1}=2-m_{s}$ . Corresponding to $\beta_{2}$ , we have ${4\choose 2}=6$ $\alpha$ s, which are $\alpha_{12},\alpha_{13},\alpha_{14},\alpha_{23},\alpha_{24},\alpha_{34}$ which sum to $\beta_{2}=m_{s}-1$ . Thus, we have the $\bm{\alpha}$ vector:

[TABLE]

the $\bm{b}$ vector:

[TABLE]

and the $\bm{A}$ matrix:

[TABLE]

Note, in the first 4 rows of $\bm{A}$ , in the first 4 columns we have all possible vectors with only one 1, and in the remaining 6 columns we have all possible vectors with two 1s.

To prove the existence of a feasible solution for $\bm{A}\bm{\alpha}=\bm{b}$ , we show that $\bm{b}$ is always a positive linear combination of columns of $\bm{A}$ . From the first statement of [56, Theorem 3.3], we note that if we can find a column of $\bm{A}$ , for instance $\bm{u}$ , such that for all $\bm{v}$ that satisfy $\bm{b}^{T}\bm{v}>0$ , we have $\bm{u}^{T}\bm{v}>0$ ; then $\bm{b}$ is a positive linear combination of the columns of $\bm{A}$ . Note that, from the last property of $\bm{A}$ , if we can find such a column, then we can find an ${\mathcal{S}}\subseteq\{1,\cdots,N\}$ that satisfy one of the following inequalities and vice versa:

[TABLE]

where

[TABLE]

First, we order the variables $v_{i}$ and $m_{i}$ , $i\in\{1,\cdots,N\}$ among themselves in the decreasing order and we define $m_{i}^{\prime}$ and $v_{i}^{\prime}$ , $i\in\{1,2,\ldots,N\}$ such that,

[TABLE]

Then, we have the following series of inequalities for all $\bm{v}$ that satisfy $\bm{b}^{T}\bm{v}>0$ :

[TABLE]

where in (73), we use Lemma 4 and insert the values of $\beta_{i}$ and $\beta_{i+1}$ , and in (74) we use the rearrangement inequality [58]. We have (75) by using the fact that $m_{s}=\sum_{j=1}^{N}m_{j}$ is between $i$ and $i+1$ , where each $m_{j}$ is a real number between 0 and 1, and by redistributing the $m_{j}^{\prime}$ values where we maximize the ones that are the coefficients of the largest $v_{j}^{\prime}$ values. Next, we observe that, $(m_{s}-i)v_{i+1}^{\prime}+(i+1-m_{s})v_{N+1}+(m_{s}-i)v_{N+2}$ is the convex combination of $v_{i+1}^{\prime}+v_{N+2}$ and $v_{N+1}$ , which results in (76). Hence, we have,

[TABLE]

for all $\bm{v}$ that satisfy $\bm{b}^{T}\bm{v}>0$ . Finally, (77) shows that we can always find ${\mathcal{S}}\subseteq\{1,\cdots,N\}$ that satisfies either (68) or (69), concluding the proof. $\blacksquare$

7 Conclusions

We considered a PIR system where a data center places available content into $N$ heterogeneous sized databases, from which a user retrieves a file privately. We determined the exact PIR capacity (i.e., the minimum download cost) under arbitrary storage constraints. By showing the achievability of the solution of a relaxed problem where all available storage space is pooled into a sum storage space, by the original problem with individual storage constraints, we showed the equivalence of the heterogeneous PIR capacity to the corresponding homogeneous PIR capacity. Therefore, we showed that there is no loss in PIR capacity due to database storage size heterogeneity, so long as the placement phase is optimized.

Appendix A Alternative Proof for Lemma 5 for $N=4$

Here, we give an alternative proof of Lemma 5 for $N=4$ using Farkas’ lemma. We illustrate the general idea using the example case $1<m_{s}<2$ . Using Lemma 4, we have $\beta_{1}^{*}=2-m_{s}$ and $\beta_{2}^{*}=m_{s}-1$ . We want to show the existence of $\alpha_{i}\geq 0$ and $\alpha_{ij}\geq 0$ for all $i,j$ such that,

[TABLE]

This is a linear system with 10 unknowns and 6 equations in the form of $\tilde{\bm{A}}\bm{\alpha}=\tilde{\bm{b}}$ , where $\tilde{\bm{A}}$ is the coefficients matrix. To show the existence of a non-negative solution, we use Farkas’ lemma, which states that there exists a non-negative solution $\bm{\alpha}\geq\bm{0}$ that satisfies $\tilde{\bm{A}}\bm{\alpha}=\tilde{\bm{b}}$ if and only if for all $\bm{y}$ for which $\tilde{\bm{A}}^{T}\bm{y}\geq\bm{0}$ , we have $\tilde{\bm{b}}^{T}\bm{y}\geq 0$ . We transform the system of equations into the reduced-echelon form with:

[TABLE]

with

[TABLE]

and

[TABLE]

Hence, for any $\bm{y}$ , $\tilde{\bm{A}}^{T}\bm{y}\geq\bm{0}$ implies,

[TABLE]

Now, we need to show $\tilde{\bm{b}}^{T}\bm{y}\geq 0$ . We have the following for $\tilde{\bm{b}}\leq\bm{0}$ (the worst case):

[TABLE]

where (100) follows from (87)-(96) taking into consideration that $1-m_{s}+m_{3}\leq 0$ and $1-m_{s}+m_{4}\leq 0$ .

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan. Private information retrieval. Journal of the ACM , 45(6):965–981, November 1998.
2[2] N. B. Shah, K. V. Rashmi, and K. Ramchandran. One extra bit of download ensures perfectly private information retrieval. In IEEE ISIT , June 2014.
3[3] T. Chan, S. Ho, and H. Yamamoto. Private information retrieval for coded storage. In IEEE ISIT , June 2015.
4[4] A. Fazeli, A. Vardy, and E. Yaakobi. Codes for distributed PIR with low storage overhead. In IEEE ISIT , June 2015.
5[5] R. Tajeddine and S. El Rouayheb. Private information retrieval from MDS coded data in distributed storage systems. In IEEE ISIT , July 2016.
6[6] H. Sun and S. A. Jafar. Blind interference alignment for private information retrieval. In IEEE ISIT , July 2016.
7[7] H. Sun and S. A. Jafar. The capacity of private information retrieval. IEEE Trans. on Info. Theory , 63(7):4075–4088, July 2017.
8[8] H. Sun and S. A. Jafar. The capacity of robust private information retrieval with colluding databases. IEEE Trans. on Info. Theory , 64(4):2361–2370, April 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases††thanks: This work was supported by NSF Grants CNS 15-26608, CCF 17-13977 and ECCS 18-07348. A shorter version is submitted to IEEE ISIT 2019.

Abstract

1 Introduction

2 System Model

3 Main Results

Theorem 1

Theorem 2

Remark 1

Remark 2

4 Representative Example: N=3N=3N=3

4.1 Converse Proof

4.2 Achievability Proof

4.3 Explicit Storage Assignment

Regime 1:

Regime 2:

Regime 3:

5 Optimal Download Cost for the General Problem

5.1 General Achievability Proof

5.2 General Converse Proof

6 Equivalence to the Homogeneous Problem

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Lemma 5

7 Conclusions

Appendix A Alternative Proof for Lemma 5 for N=4N=4N=4

4 Representative Example: $N=3$

Appendix A Alternative Proof for Lemma 5 for $N=4$