ON-OFF Privacy with Correlated Requests

Carolina Naim; Fangwei Ye; Salim El Rouayheb

arXiv:1905.00146·cs.IT·May 2, 2019

ON-OFF Privacy with Correlated Requests

Carolina Naim, Fangwei Ye, Salim El Rouayheb

PDF

TL;DR

This paper introduces the ON-OFF privacy problem, addressing how to maximize download rates while maintaining privacy when user requests are correlated, modeled as a Markov chain with two sources.

Contribution

The paper formulates the ON-OFF privacy problem, models correlated requests as a Markov chain, and proposes an optimal privacy scheme for two sources.

Findings

01

Proposed an ON-OFF privacy scheme for two sources.

02

Proved the scheme's optimality under Markov request modeling.

03

Addresses privacy leakage due to request correlation.

Abstract

We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of N online sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum download rate that allow the user to obtain privately his requested messages. In many realistic scenarios, the user's requests are correlated since they depend on his personal attributes such as age, gender, political views, or geographical location. Hence, even when privacy is OFF, he cannot simply reveal his request since this will leak information about his requests when privacy was ON. We study the case when the users's requests can be modeled by a Markov chain and N=2 sources. In this case, we propose an ON-OFF privacy scheme and prove its…

Tables4

Table 1. TABLE I: An example of our ON-OFF privacy scheme for α = β = 0.2 𝛼 𝛽 0.2 \alpha=\beta=0.2 . The query Q 1 subscript 𝑄 1 Q_{1} at t = 1 𝑡 1 t=1 is a probabilistic function of X 0 subscript 𝑋 0 X_{0} and X 1 subscript 𝑋 1 X_{1} , the requests at t = 0 𝑡 0 t=0 and t = 1 𝑡 1 t=1 respectively. The entries of the table represent the probabilities p ( Q 1 ∣ X 0 , X 1 ) 𝑝 conditional subscript 𝑄 1 subscript 𝑋 0 subscript 𝑋 1 p(Q_{1}\mid X_{0},X_{1}) . Q 1 = A B subscript 𝑄 1 𝐴 𝐵 Q_{1}=AB means that the user downloads the videos from both sources A 𝐴 A and B 𝐵 B .

$X_{0}$	$X_{1}$	$Q_{1} = A$	$Q_{1} = B$	$Q_{1} = A B$
$A$	$A$	$0.25$	$0$	$0.75$
$A$	$B$	$0$	$1$	$0$
$B$	$A$	$1$	$0$	$0$
$B$	$B$	$0$	$0.25$	$0.75$

Table 2. TABLE II: Nomenclature and Notation

Symbol	Definition
$N$	Number of sources
$W_{x, t}$	Message generated by source of index $x$ at time $t$
$X_{t}$	User’s request at time $t$ ( $X_{t} \in 𝒩$ )
$F_{t}$	Privacy mode at time $t$ (ON or OFF)
$Q_{t}$	Query sent by the user to the server at time $t$
$Y_{t}$	Answer sent by the server to the user at time $t$
$S_{t}$	Local randomness generated by the user at time $t$
$ℓ_{t}$	Average length of the answer $Y_{t}$
$R_{t}$	Rate at time $t$
$[a : b]$	$= {a, \dots, b}$ for any integers $a$ and $b$ such that $a \leq b$
$(a)$	$= {i : i = a, a - 1, a - 2, \dots}$ for any integer $a$

Table 3. TABLE III: The proposed ON-OFF privacy scheme achieving capacity. The query Q t subscript 𝑄 𝑡 Q_{t} is probabilistic and depends on the current request X t subscript 𝑋 𝑡 X_{t} , the previous query Q t − 1 subscript 𝑄 𝑡 1 Q_{t-1} and the last private request X 0 subscript 𝑋 0 X_{0} . If Q t − 1 ≠ A B subscript 𝑄 𝑡 1 𝐴 𝐵 Q_{t-1}\neq AB then Q t = X t subscript 𝑄 𝑡 subscript 𝑋 𝑡 Q_{t}=X_{t} . Otherwise, Q t − 1 subscript 𝑄 𝑡 1 Q_{t-1} is chosen based on the probabilities p ( Q t ∣ X 0 , X t , Q t − 1 = A B ) 𝑝 conditional subscript 𝑄 𝑡 subscript 𝑋 0 subscript 𝑋 𝑡 subscript 𝑄 𝑡 1 𝐴 𝐵 p(Q_{t}\mid X_{0},X_{t},Q_{t-1}=AB) given in this table for (a) α + β < 1 𝛼 𝛽 1 \alpha+\beta<1 , (b) and (c) are for α + β > 1 𝛼 𝛽 1 \alpha+\beta>1 where t 𝑡 t is even or odd respectively.

	$A$	$B$	$A B$	$A$	$B$	$A B$	$A$	$B$	$A B$
$A, A$	$\frac{β}{1 - α}$	0	$\frac{1 - α - β}{1 - α}$	$\frac{1 - α}{β}$	0	$\frac{α + β - 1}{β}$	1	0	0
$A, B$	0	$1$	0	$0$	$1$	$0$	0	$\frac{1 - β}{α}$	$\frac{α + β - 1}{α}$
$B, A$	$1$	0	0	$1$	0	0	$\frac{1 - α}{β}$	0	$\frac{α + β - 1}{β}$
$B, B$	0	$\frac{α}{1 - β}$	$\frac{1 - α - β}{1 - β}$	0	$\frac{1 - β}{α}$	$\frac{α + β - 1}{α}$	0	1	0
	(a) $α + β < 1$			(b) $α + β > 1$ and $t$ is even			(c) $α + β > 1$ and $t$ is odd

Table 4. TABLE IV: The joint distribution p ( Q t , X 0 , X t ) 𝑝 subscript 𝑄 𝑡 subscript 𝑋 0 subscript 𝑋 𝑡 p(Q_{t},X_{0},X_{t})

$(X_{0}, X_{t})$	$Q_{t} \in 𝒬_{a}$	$Q_{t} \in 𝒬_{b}$	$Q_{t} \in 𝒬_{a b}$
$(A, A)$	$p_{1}$	0	$P (A, A) - p_{1}$
$(A, B)$	0	$p_{2}$	$P (A, B) - p_{2}$
$(B, A)$	$\frac{1 - δ}{δ} p_{1}$	0	$P (B, A) - \frac{1 - δ}{δ} p_{1}$
$(B, B)$	0	$\frac{1 - δ}{δ} p_{2}$	$P (B, B) - \frac{1 - δ}{δ} p_{2}$

Equations165

Pr (Q_{1} = q) = Pr (Q_{1} = q ∣ X_{0} = x_{0}),

Pr (Q_{1} = q) = Pr (Q_{1} = q ∣ X_{0} = x_{0}),

H (W_{x, t} : x \in N, t \in Z) = x, t \sum H (W_{x, t}),

H (W_{x, t} : x \in N, t \in Z) = x, t \sum H (W_{x, t}),

H (W_{x, t}) = L .

H (W_{x, t}) = L .

F_{t} = {ON, OFF, t \leq 0, t \geq 1.

F_{t} = {ON, OFF, t \leq 0, t \geq 1.

ℓ_{t} = E_{Q_{t}} [ℓ (Q_{t})] .

ℓ_{t} = E_{Q_{t}} [ℓ (Q_{t})] .

Q_{t} = ϕ_{t} (X_{(t)}, S_{(t)}) .

Q_{t} = ϕ_{t} (X_{(t)}, S_{(t)}) .

Y_{t} = ρ_{t} (Q_{t}, W_{1, t}, \dots, W_{N, t}) .

Y_{t} = ρ_{t} (Q_{t}, W_{1, t}, \dots, W_{N, t}) .

H (W_{X_{t}, t} ∣ Y_{t}) = 0, \forall t \in Z .

H (W_{X_{t}, t} ∣ Y_{t}) = 0, \forall t \in Z .

I (X_{B_{t}}; Q_{t} ∣ Q_{(t - 1)}) = 0, \forall t \in Z,

I (X_{B_{t}}; Q_{t} ∣ Q_{(t - 1)}) = 0, \forall t \in Z,

M = [1 - α β α 1 - β],

M = [1 - α β α 1 - β],

R_{t} \leq {\frac{1}{2}, \frac{1}{1 + ∣1 - α - β ∣ ^{t}}, t \leq 0, t \geq 1.

R_{t} \leq {\frac{1}{2}, \frac{1}{1 + ∣1 - α - β ∣ ^{t}}, t \leq 0, t \geq 1.

Y_{t} = ρ_{t} (Q_{t}, W_{A, t}, W_{B, t}) = ⎩ ⎨ ⎧ W_{A, t}, W_{B, t}, {W_{A, t}, W_{B, t}}, Q_{t} = A, Q_{t} = B, Q_{t} = A B .

Y_{t} = ρ_{t} (Q_{t}, W_{A, t}, W_{B, t}) = ⎩ ⎨ ⎧ W_{A, t}, W_{B, t}, {W_{A, t}, W_{B, t}}, Q_{t} = A, Q_{t} = B, Q_{t} = A B .

ℓ (Q_{t}) = {L, 2 L, Q_{t} = A or B, Q_{t} = A B .

ℓ (Q_{t}) = {L, 2 L, Q_{t} = A or B, Q_{t} = A B .

\frac{ℓ _{t}}{L} = 1 + Pr (Q_{t} = A B) .

\frac{ℓ _{t}}{L} = 1 + Pr (Q_{t} = A B) .

Q_{t} = ϕ_{t} (X_{0}, X_{t}, Q_{t - 1}, S_{t}) .

Q_{t} = ϕ_{t} (X_{0}, X_{t}, Q_{t - 1}, S_{t}) .

I (X_{B_{t}}; Q_{t} ∣ Q_{(t - 1)})

I (X_{B_{t}}; Q_{t} ∣ Q_{(t - 1)})

= I (X_{0}; Q_{t} ∣ Q_{(t - 1)}) + I (X_{B_{t} \ {0}}; Q_{t} ∣ X_{0}, Q_{(t - 1)})

= 0,

I (X_{B_{t} \ {0}}; Q_{t} ∣ X_{0}, Q_{(t - 1)})

I (X_{B_{t} \ {0}}; Q_{t} ∣ X_{0}, Q_{(t - 1)})

= H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{(t - 1)}) - H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{(t)})

\leq H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}) - H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{(t)})

= H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}) - H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}, Q_{[1 : t]})

\leq (a) H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}) - H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}, X_{[1 : t]}, S_{[1 : t]})

= (b) H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}}) - H (X_{B_{t} \ {0}} ∣ X_{0}, Q_{B_{t}})

= 0,

\frac{1}{R _{t}} = 1 + Pr (Q_{t} = A B)

\frac{1}{R _{t}} = 1 + Pr (Q_{t} = A B)

P = 1 - α β β α 1 - β α 00 1 - α - β, if α + β \leq 1,

P = 1 - α β β α 1 - β α 00 1 - α - β, if α + β \leq 1,

P = 1 - α β 1 - α α 1 - β 1 - β 00 α + β - 1, if α + β > 1.

P = 1 - α β 1 - α α 1 - β 1 - β 00 α + β - 1, if α + β > 1.

Pr (Q_{t} = A B)

Pr (Q_{t} = A B)

= Pr (Q_{0} = A B) i = 1 \prod t Pr (Q_{i} = A B ∣ Q_{i - 1} = A B)

= (b) i = 1 \prod t Pr (Q_{i} = A B ∣ Q_{i - 1} = A B),

Pr (Q_{i} = A B ∣ Q_{i - 1} = A) = Pr (Q_{i} = A B ∣ Q_{i - 1} = B) = 0

Pr (Q_{i} = A B ∣ Q_{i - 1} = A) = Pr (Q_{i} = A B ∣ Q_{i - 1} = B) = 0

Pr (Q_{t} = A B) = ∣1 - α - β ∣^{t} .

Pr (Q_{t} = A B) = ∣1 - α - β ∣^{t} .

R_{t} \leq \frac{1}{1 + ∣1 - α - β ∣ ^{t}} .

R_{t} \leq \frac{1}{1 + ∣1 - α - β ∣ ^{t}} .

R_{t} \leq \frac{1}{1 + ∣1 - α - β ∣ ^{t}},

R_{t} \leq \frac{1}{1 + ∣1 - α - β ∣ ^{t}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

ON-OFF Privacy with Correlated Requests

Carolina Naim, Fangwei Ye, Salim El Rouayheb

Department of Electrical and Computer Engineering, Rutgers University

Emails: {carolina.naim, fangwei.ye, salim.elrouayheb}@rutgers.edu

Abstract

We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of $N$ online sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum download rate that allow the user to obtain privately his requested messages. In many realistic scenarios, the user’s requests are correlated since they depend on his personal attributes such as age, gender, political views, or geographical location. Hence, even when privacy is OFF, he cannot simply reveal his request since this will leak information about his requests when privacy was ON. We study the case when the users’s requests can be modeled by a Markov chain and $N=2$ sources. In this case, we propose an ON-OFF privacy scheme and prove its optimality.

I Introduction

I-A Motivation

Privacy is a major concern for online users who can unknowingly reveal critical personal information (age, sex, diseases, political proclivity, etc.) through daily online activities such as watching online videos, following people and liking posts on social media, reading news and searching websites. This is now a well-acknowledged concern and has lead to many interesting theoretical problems such as anonymity [1], differential privacy [2], private information retrieval [3], and other privacy-preserving algorithms.

In all these formulations the user is assumed to always want to maintain a certain level of privacy, which we refer to as privacy being always ON. However, in many scenarios, the user may wish to switch between privacy being ON and OFF. This switch depends on several criteria such as location, network/connection or phone/machine being used, to name a few. The reason the user may want to flip between these two modes, instead of keeping privacy always ON, is that typically privacy-preserving solutions incur a degradation in the quality of service, mostly felt by the user through large delays. Service providers may also be interested in incentivizing the user to require privacy only when it is needed since private solutions also incur higher communication and computation costs on their side.

One may be tempted to propose the simple solution in which the user has available to him two schemes, one private and one non-private. Over time, the user simply switches between these two schemes depending on whether privacy is turned ON or OFF. The problem with this solution is that it guarantees privacy only if the user’s online activities are statistically independent over time. However, a user’s online activities are typically personal, making them correlated over time. For example, a bilingual English/Spanish user, who is checking the news in Spanish now, is more likely to keep reading the news in Spanish for a while before switching to English. At that point English becomes more probable. Another example is when the user is watching online videos. The user chooses the video to watch next from a list of videos recommended to him and this list depends on previously watched videos. Thus, due to correlation, simply ignoring the privacy requirement when privacy is OFF may reveal information about the activities when privacy was ON.

I-B Example

To be more concrete and to gently introduce our setup for ON-OFF privacy, we give the following example. Suppose a user is watching political or news videos online. At each time $t$ , the user has a choice between two new videos each of which is produced by two different news sources, $A$ and $B$ . Source $A$ is politically left-leaning and source $B$ is right-leaning.

Let $X_{t}\in\{A,B\}$ be the source whose video the user wants to watch at time $t\in\mathbb{Z}$ . We model the correlation among the user’s requests by assuming that $X_{t}$ is the two-state Markov chain depicted in Figure 1, where $\alpha=\Pr(X_{t+1}=B\mid X_{t}=A)$ and $\beta=\Pr(X_{t+1}=A\mid X_{t}=B)$ . For example, we choose $\alpha=\beta=0.2$ . This means that if the current video being watched is left-leaning, there is an $80\%$ chance that the next video is also left-leaning, and vice versa.

For the sake of brevity, we focus on the two time instants $t=0$ and $t=1$ , and assume that privacy is ON at $t=0$ and is switched to OFF at $t=1$ . This means that the user would like to hide whether he was watching a left-leaning or a right-leaning video at time $t=0$ , but does not care about revealing the source of the video he watched at $t=1$ .

The goal is to devise an ON-OFF privacy scheme that always gives the user the video he wants, but never reveals the choice of sources when privacy is ON, i.e., $t=0$ in this case. We are interested in schemes that minimize the download cost, or equivalently maximize the download rate (the inverse of the normalized download cost).

At $t=0$ , the problem is simple. The user achieves privacy by downloading both videos. We say that the user’s query at $t=0$ is $Q_{0}=AB$ . Therefore, the download rate at $t=0$ is $R_{0}=1/2$ .

At $t=1$ , the privacy is OFF. Now, the user must be careful not to directly declare his request, because this may reveal information about his request at $t=0$ which is to remain private. The user can again download both videos, i.e., $Q_{1}=AB$ , and achieve privacy with a rate $R_{1}=1/2$ .

Our key result is that the user can achieve a better rate at $t=1$ , without compromising privacy, by

•

choosing randomly between downloading $A$ ( $Q_{1}=A$ ) or both $A$ and $B$ ( $Q_{1}=AB$ ) if he wants $X_{1}=A$ ,

•

choosing randomly between downloading $B$ ( $Q_{1}=B$ ) or both $A$ and $B$ ( $Q_{1}=AB$ ) if he wants $X_{1}=B$ .

This random choice must also depend on the request $X_{0}$ at $t=0$ . The different probabilities defining the scheme are given in Table I and will be justified later when we explain the general scheme. For now, one can check that these probabilities lead to

[TABLE]

for any $q\in\{A,B,AB\}$ and any $x_{0}\in\{A,B\}$ . Thus, $X_{0}$ and $Q_{1}$ are independent and the proposed scheme in Table I achieves perfect privacy for the request at $t=0$ . Moreover, the scheme ensures that the user always obtains the video he is requesting.

For $t=1$ , the rate $R_{1}=1/(2-\alpha-\beta)=0.625$ , which is strictly greater than $0.5$ , the rate of querying both files. We later show that this rate is actually optimal. In fact, the values in Table I were carefully chosen to achieve the privacy at the highest download rate. Any other choice of the probabilities $p(Q_{1}\mid X_{0},X_{1})$ would either violate privacy or lose the optimality of the rate.

I-C Setup & Contributions

We introduce a mathematical model to capture the ON-OFF privacy problem when the user is downloading data from online sources.

We consider the setup in which there are $N$ information sources each producing a new message at each time $t\in\mathbb{Z}$ . At each time $t$ , the user randomly chooses one of the sources and requests its latest produced message.

The privacy constraint is the following: the user wants to leak zero information about the identity of the sources in which he is interested at each time $t$ when the privacy is ON. The main challenge stems from the fact that the user’s requests are not independent. As in the previous example, we model the dependence between these requests by an $N$ - state Markov chain. The goal is to design an ON-OFF privacy scheme with maximum download rate that satisfies the user’s request and guarantees the privacy of the requests made when privacy is ON.

Our technical results can be summarized as follows. We study the case of $N=2$ sources for the special but important case where privacy is ON for $t\leq 0$ and switched OFF for $t\geq 1$ . We prove an upper bound on the instantaneous download rate at each time $t$ , and give an ON-OFF privacy scheme that achieves it.

I-D Related Work

The special case of the ON-OFF privacy problem in which privacy is always ON and the user’s requests are independent reduces to the information-theoretic private information retrieval (PIR) problem on a single server. In this case, the user cannot do anything smarter than downloading everything [3] (except the recently studied problem when the user has side information [4] which is not the case here). Recently, there has been significant research activity on determining the maximum download rate of PIR with multiple servers (e.g. [5, 8, 6, 7, 9]). However, the model there requires multiple servers and, in the parlance of this paper, privacy is assumed to be always ON.

II Problem Formulation and notations

The ON-OFF privacy model can be described as follows. A single server stores $N$ sources indexed by $\mathcal{N}:=\{1,\ldots,N\}$ . Each source generates a message $W_{x,t}$ at time $t$ , where $x~{}\in~{}\mathcal{N}$ . We only consider a discrete time throughout this paper, i.e., $t\in\mathbb{Z}$ . For any integers $a$ and $b$ such that $a\leq b$ , denote $\{a,\ldots,b\}$ by $[a:b]$ , and $\{i:i=a,a-1,a-2,\dots\}$ by $(a)$ .

A user retrieves messages consecutively from the server. He is interested in one of the sources at each time, and wishes to retrieve the latest message generated by the desired source.

In particular, let $X_{t}$ be the index of the desired source at time $t$ , which takes values in $\mathcal{N}$ , and in the sequel we call $X_{t}$ the user’s request. By slightly abusing the notation, we denote the latest message generated by the desired source $X_{t}$ as $W_{X_{t},t}$ , and the user wishes to retrieve the message $W_{X_{t},t}$ . We assume that the messages $\{W_{x,t}:x\in\mathcal{N},t\in\mathbb{Z}\}$ are mutually independent, each of which consists of $L$ symbols. Without loss of generality, we assume that each of the messages is uniformly distributed on $\{0,1\}^{L}$ , i.e.,

[TABLE]

and

[TABLE]

As discussed in Section I, we are particularly interested in the case where the requests $X_{t}$ , for $t\in\mathbb{Z}$ , form a Markov chain. The transition matrix of the Markov chain is known to both the server and the user.

Meanwhile, the user may or may not wish to keep the identity of the source he is interested in at time $t$ , hidden from the server. Specifically, the privacy mode $F_{t}$ at time $t$ can be either ON or OFF, where $F_{t}$ is ON when the user wishes to keep $X_{t}$ private, while $F_{t}$ is OFF when the user is not concerned with privacy.

In this paper, we focus on the case in which the privacy mode is the step function given by .

[TABLE]

Solving the problem for this step function is an essential building block for tackling the general case where $\{F_{t}:t\in\mathbb{Z}\}$ is a random process. A discussion about the general case can be found in Appendix D.

The user is allowed to generate unlimited local randomness and we are not interested in the amount of randomness used. Therefore, we assume without loss of generality that the random variables $\{S_{t}:t\in\mathbb{Z}\}$ , representing the local randomness, are mutually independent. Moreover, we assume that the user’s requests $\{X_{t}:t\in\mathbb{Z}\}$ , the messages $\{W_{x,t}:x\in\mathcal{N},t\in\mathbb{Z}\}$ and the local randomness $\{S_{t}:t\in\mathbb{Z}\}$ are mutually independent.

As discussed in Section I, if the user carelessly downloads the desired message at time $t$ when the privacy is OFF, the privacy in the past may be compromised. To ensure privacy, the user may utilize the requests $\{X_{i}:i\leq t\}$ and the local randomness $\{S_{i}:i\leq t\}$ to construct a query $Q_{t}$ and send it to the server. Upon receiving the query, the server responds to the request by producing the answer $Y_{t}$ consisting of $\ell\left(Q_{t}\right)$ symbols, where the length of $Y_{t}$ is a function of the query $Q_{t}$ . Thus, the average length of the answer $Y_{t}$ is given by

[TABLE]

The query $Q_{t}$ at time $t$ is assumed to be a function of all the requests $\{X_{i}:i\leq t\}$ and all the local randomness $\{S_{i}:i\leq t\}$ up to and including time $t$ , i.e.,

[TABLE]

Note that since the previous answers $\{Y_{i}:i<t\}$ are functions of the previous messages, which are independent on the current message, the previous answers will not help in retrieving the current message, so without loss of generality, $Q_{t}$ is not encoded from $\{Y_{i}:i<t\}$ .

Correspondingly, the answer $Y_{t}$ of the server is a function of the query $Q_{t}$ and the messages $\{W_{x,t}:x\in\mathcal{N}\}$ , i.e.,

[TABLE]

These functions need to satisfy the decodability and the privacy constraints, i.e.,

Decodability: For any time $t$ , the user should be able to recover the desired message from the answer with zero-error probability, i.e.,

[TABLE] 2. 2.

Privacy: For any time $t$ , given all past queries received by the server, the query $Q_{t}$ should not reveal any information about all the past or present requests where the privacy is ON, that is

[TABLE]

where $\mathcal{B}_{t}=\{i:i\leq t,F_{i}=\text{ON}\}$ .

For any message length $L$ , the tuple $\left(\ell_{t}:t\in\mathbb{Z}\right)$ is said to be achievable if there exists a code satisfying the decodability and the privacy constraint. The efficiency of the code can be measured by the download rate $R_{t}:=\frac{L}{\ell_{t}}$ . Hence, we define the achievable region as follows:

Definition 1.

The rate tuple $\left(R_{t}:t\in\mathbb{Z}\right)$ is achievable if there exists a code with message length $L$ and average download cost $\ell_{t}$ such that $R_{t}\leq\frac{L}{\ell_{t}}$ .

Conventionally, the capacity region $\mathscr{C}\left(\mathcal{P}\right)$ can be defined as the closure of the set of achievable rate tuples $\left(R_{t}:t\in\mathbb{Z}\right)$ , where $\mathcal{P}$ is the set of all possible probability distributions of $p\left(X_{t}:t\in\mathbb{Z}\right)$ . Table II summarizes our notation.

III Main results

Our main result is a complete characterization of the achievable region for the case of two sources, i.e., $N=2$ . We will use $A$ and $B$ to denote these two sources. In this case, the requests $X_{t}$ follow a two state Markov chain defined by the transition matrix

[TABLE]

where $\alpha$ is the transition probability from $A$ to $B$ , and $\beta$ is the transition probability from $B$ to $A$ .

We first state the main theorem of this paper.

Theorem 1.

For privacy mode given in (3), the rate tuple $\left(R_{t}:t\in\mathbb{Z}\right)$ is achievable if and only if

[TABLE]

When privacy is ON, for $t\leq 0$ , the user has to request both the most recent messages of $A$ and $B$ . Therefore, the rate $R_{t}=1/2$ .

The more interesting part of Theorem 1 is for $t\geq 1$ . For a fixed time $t\geq 1$ , the rate as a function of $\alpha$ and $\beta$ is symmetric around $\alpha+\beta=1$ . When $\alpha+\beta=1$ , the user’s requests are independent such that $p(X_{t}\mid X_{t-1})=p(X_{t})$ , so the user can directly query for his desired message, i.e., $Q_{t}=X_{t}$ . The rate is then maximized to $R_{t}=1$ .

In terms of asymptotics, when the Markov chain is ergodic, the download rate goes to $1$ as $t$ goes to infinity. Intuitively, as $t$ grows, the information carried by $X_{t}$ about $X_{0}$ decreases, so the user can eventually directly query for what he wants, i.e., $Q_{t}=X_{t}$ . Otherwise, when the Markov chain is not ergodic ( $\alpha=\beta=0$ or $\alpha=\beta=1$ ), not much can be done and the rate is constant at $R_{t}=1/2$ . The user has to query for both messages of $A$ and $B$ at every time $t$ , i.e., $Q_{t}=AB$ for all $t$ .

Figure 2 shows the rate $R_{t}$ as a function of time for different values of $\alpha+\beta$ . As $\alpha+\beta$ approaches $1$ , the correlation between the request decreases leading to an increase in the rate.

In the following section, we give the scheme that achieves the rate tuples given in Theorem 1. We prove the converse in Appendix A.

IV Achievability of Theorem 1

IV-A ON-OFF Privacy Scheme

In this section, we will describe an ON-OFF privacy scheme that achieves the rate in Theorem 1, by specifying its encoding functions $\{\phi_{t},\rho_{t}\}$ defined in Section II.

Our coding scheme retrieves of the messages in uncoded form. More specifically, the alphabet for the queries is $\mathcal{Q}=~{}\{A,B,AB\}$ . The query values $A,B$ and $AB$ denote respectively the user requesting the latest message of source $A$ , $B$ or both. Upon receiving $Q_{t}\in\mathcal{Q}$ , the server responds by sending either one or two messages, such that

[TABLE]

The length of the answer $\ell\left(Q_{t}\right)$ is given by

[TABLE]

The normalized average length is

[TABLE]

It remains to specify the query encoding functions $\{\phi_{t}\}$ . The query encoding function $\phi_{t}$ at time $t$ is described as follows:

•

For $t\leq 0$ , we simply download two messages to guarantee privacy, i.e., $Q_{t}=AB$ . This is an immediate result in information-theoretic single-server private information retrieval [3].

•

For $t\geq 1$ , the query $Q_{t}$ is a function of $Q_{t-1}$ , $X_{0}$ , $X_{t}$ and the local randomness $S_{t}$ , i.e.,

[TABLE]

Since we are not interested in the local randomness used, instead of specifying the function $\phi_{t}$ explicitly, we regard $Q_{t}$ as a probabilistic function of $\{X_{0},X_{t},Q_{t-1}\}$ , and the distribution $p\left(Q_{t}|X_{0},X_{t},Q_{t-1}\right)$ is as follows:

Given $X_{0}$ , $X_{t}$ , and $Q_{t-1}$ ,

if $Q_{t-1}\neq AB$ , then $Q_{t}=X_{t}$ with probability $1$ . 2. 2.

if $Q_{t-1}=AB$ , then $p(Q_{t}\mid X_{0},X_{t},Q_{t-1})$ is as given in Table III.

IV-B Privacy

In this subsection, we prove that the given scheme satisfies the privacy constraint for $t\geq 1$ . Recall the privacy constraint (8) that $I\left(X_{\mathcal{B}_{t}};Q_{t}|Q_{(t-1)}\right)=0$ , where $\mathcal{B}_{t}=\{i:i\leq 0,i\in\mathbb{Z}\}$ . We want to show that

[TABLE]

To do that we will show that each of the terms in the sum in (12) is equal to zero.

Claim 1.

$I\left(X_{\mathcal{B}_{t}\backslash\{0\}};Q_{t}|X_{0},Q_{(t-1)}\right)=0$ .

The claim can be justified as follows:

[TABLE]

where (a) follows because $Q_{[1:t]}$ is a function of $\left\{X_{[0:t]},S_{[1:t]}\right\}$ , and (b) follows from the independence between $\{X_{i}:i\in\mathbb{Z}\}$ and $\{S_{i}:i\in\mathbb{Z}\}$ , and the Markovity of $\{X_{i}:i\in\mathbb{Z}\}$ .

Claim 2.

$I\left(X_{0};Q_{t}|Q_{(t-1)}\right)=0$ * for $t\geq 1$ .*

The proof of Claim 2 can be found in Appendix B.

IV-C Rate

Now, we evaluate the rate achieved by this coding scheme. We know from (11) that

[TABLE]

is achievable. For $t\leq 0$ , since $\Pr\left(Q_{t}=AB\right)=1$ , we know that $R_{t}=\frac{1}{2}$ is achievable. To complete the computation of the rate, for $t\geq 1$ , we need the following result in Lemma 1 whose proof can be found in Appendix C.

Lemma 1.

The random variables $\{Q_{t}:t\geq 0\}$ form a Markov chain with transition matrix $P$ , where

[TABLE]

and

[TABLE]

From Lemma 1, we easily obtain that

[TABLE]

where (a) follows because

[TABLE]

for the transition matrices given in both (13) and (14); and (b) follows from $\Pr\left(Q_{0}=AB\right)=1$ , which can be justified because the user is required to download both messages at $t=0$ since $F_{0}=\text{ON}$ .

Using (13) and (14), we have

[TABLE]

Therefore, we can conclude that

[TABLE]

is achievable for $t\geq 1$ .

Acknowledgment

This work was supported by NSF Grant CCF 1817635.

Appendix A Converse of Theorem 1

In this section, we will prove the converse. For $t\leq~{}0$ , we know from [3] that it is necessary to download two messages to achieve perfect privacy. For $t\geq~{}1$ , we will show that for any given $\{\phi_{t},\rho_{t}\}$ satisfying the decodable condition and the privacy constraint, the rate is upper bounded by

[TABLE]

or equivalently

[TABLE]

Since

[TABLE]

we consider partitioning the alphabet $\mathcal{Q}$ into three disjoint subsets $\mathcal{Q}_{a}$ , $\mathcal{Q}_{b}$ and $\mathcal{Q}_{ab}$ based on the decodability of $W_{A,t}$ , $W_{B,t}$ or $\{W_{A,t},W_{B,t}\}$ . Roughly speaking, $\rho_{t}\left(q\in\mathcal{Q}_{a},W_{A,t},W_{B,t}\right)$ can decode $W_{A,t}$ correctly but cannot decode $W_{B,t}$ . Similarly, $\rho_{t}\left(q\in\mathcal{Q}_{b},W_{A,t},W_{B,t}\right)$ can decode $W_{B,t}$ correctly but cannot decode $W_{A,t}$ , and $\rho_{t}\left(q\in\mathcal{Q}_{ab},W_{A,t},W_{B,t}\right)$ can decode both $W_{A,t}$ and $W_{B,t}$ correctly. Clearly, $\ell(q\in\mathcal{Q}_{a})\geq L$ , $\ell(q\in\mathcal{Q}_{b})\geq L$ and $\ell(q\in\mathcal{Q}_{ab})\geq 2L$ . Hence, we have

[TABLE]

Recall the privacy constraint for $i\geq 1$ ,

[TABLE]

where $\mathcal{A}_{0}=\{\ldots,-2,-1,0\}$ . Since (16) holds for any $i\geq 1$ , for a fixed $t$ , we have

[TABLE]

From $I\left(X_{\mathcal{A}_{0}};Q_{(t)}\right)=0$ , we can easily have

[TABLE]

and (17) can be written as

[TABLE]

Now, we focus on the marginal distribution $p\left(Q_{t},X_{0},X_{t}\right)$ . For notational simplicity, let $P(A,A)=\Pr\left(X_{0}=A,X_{t}=A\right)$ and $P(A|A)=\Pr\left(X_{t}=A|X_{0}=A\right)$ . Here, $P(A,B)$ , $P(B,A)$ , $P(B,B)$ and $P(A|B)$ , $P(B|A)$ , $P(B|B)$ are defined similarly. Also, let $\delta=\Pr\left(X_{0}=A\right)$ and $1-\delta=\Pr\left(X_{0}=B\right)$ .

By referring to the decodability and (18), we know that any adimissible $p\left(Q_{t},X_{0},X_{t}\right)$ can be illustrated by Table IV.

By examining the values in the table, we have

[TABLE]

Hence, we obtain that

[TABLE]

From the Markovity of $\{X_{t}:t\in\mathbb{Z}\}$ , we have

[TABLE]

Therefore, we finally obtain that

[TABLE]

which completes the converse proof.

Appendix B Proof of Claim 2

We first introduce three propositions. They show the dependency relations between random variables induced by the given coding scheme. The propositions are straightforward so the proofs are omitted.

Proposition 1.

For $t\geq 0$ , $X_{t}$ is a deterministic function of $Q_{t}$ and $X_{0}$ , i.e., $X_{t}=g(X_{0},Q_{t})$ .

Proposition 2.

For $t\geq 1$ , $\{X^{(t-1)},Q^{(t-1)}\}\rightarrow X_{t-1}\rightarrow X_{t}$ forms a Markov chain. In particular, any subset of $\{X^{(t-1)},Q^{(t-1)}\}$ is independent of $X_{t}$ given $X_{t-1}$ .

Proposition 3.

For $t\geq 1$ , $\{X^{(t-1)},Q^{(t-1)}\}\rightarrow\{X_{t},X_{0},Q_{t-1}\}\rightarrow Q_{t}$ forms a Markov chain. In particular, any subset of $\{X^{(t-1)},Q^{(t-1)}\}$ is independent of $Q_{t}$ given $\{X_{t},X_{0},Q_{t-1}\}$ .

Claim 2 is equivalent to

[TABLE]

for any $q\in\mathcal{Q}$ and $\mathbf{\bar{q}}\in{\displaystyle\prod_{i\leq t-1}}\mathcal{Q}$ . Therefore consider,

[TABLE]

where (a) follows from Proposition 3, (b) follows from Proposition 1, and (c) follows from Proposition 2 and the Markovity of $\{X_{i}:i\in\mathbb{Z}\}$ .

If $q^{\prime}=A$ or $B$ , we have

[TABLE]

where (a) follows from the fact that if $Q_{t-1}=A$ or $B$ then $X_{t-1}=Q_{t-1}$ , and (b) follows because given $Q_{t-1}=A$ or $B$ , $Q_{t}=X_{t}$ with probability $1$ .

Clearly, R.H.S of (19) is independent of the choice of $x_{0}$ , and thus it remains to show that

[TABLE]

for any $q\in\{A,B,AB\}$ . Towards this end, let us discuss separately as follows:

•

When $\alpha+\beta\leq 1$ , we have

[TABLE]

where (a) follows Table III(a) where $X_{t-1}=X_{0}$ given $Q_{t-1}=AB$ .

Substituting $x_{0}$ by $A$ and $B$ in (20) on the L.H.S and R.H.S. respectively, we can verify from Table III(a) and transition matrix $M$ that

[TABLE]

for all $q$ .

•

When $\alpha+\beta\leq 1$ and $t$ is odd, $t-1$ is even, and from Table III(b), $Q_{t-1}=AB$ only if $X_{t-1}=X_{0}$ . Therefore, (21) still holds, and we can verify (22) from Table III(c) and the transition matrix $M$ .

•

When $\alpha+\beta\leq 1$ and $t$ is even, $t-1$ is odd, and from Table III(c), $Q_{t-1}=AB$ only if $X_{t-1}\neq X_{0}$ , and we have

[TABLE]

for all $q$ . Similarly, we can verify (23) from Table III(b) and the transition matrix $M$ .

Appendix C Proof of Lemma 1

First, $Q_{t}=A$ or $B$ only if $Q_{t-1}\neq AB$ ; therefore it is easy to see the following,

[TABLE]

Then, we consider

[TABLE]

where (a) follows from Preposition 1 where $X_{t-1}$ is a function of $X_{0}$ given $Q_{t-1}$ , (b) follows from the privacy at time $t-1$ . The second term in (c) follows from $Q_{t-1}$ being a function of $X_{0}$ and $X_{t-1}$ and the Markovity of $\{X_{i}:i\in\mathbb{Z}\}$ , and the third term follows from Proposition 1.

Now we substitute $q$ by $A$ , $B$ and $AB$ and discuss two cases $\alpha+\beta\leq 1$ and $\alpha+\beta>1$ .

•

For $\alpha+\beta\leq 1$ , $X_{t-1}=g(x_{0},AB)=x_{0}$ . Then,

[TABLE]

For instance, let $\Pr(X_{0}=A)=p_{0}$ and $\Pr(X_{0}=B)=1-p_{0}$ , and using the values given in Table III(a) and transition matrix $M$ , we can verify that

[TABLE]

Similarly, we can verify the rest of values given in transition matrix $P$ for $\alpha+\beta\leq 1$ .

•

For $\alpha+\beta>1$ ,

$\bullet$

if $t$ is odd, then $t-1$ is even, and

[TABLE]

$\bullet$

if $t$ is even, then $t-1$ is odd, and

[TABLE]

We can verify the remaining elements of the transition matrix $P$ , for $\alpha+\beta>1$ , using the values in transition matrix $M$ , and the values in Table III(c) and (b), for $t$ odd and even respectively.

Appendix D General Privacy Mode

So far, we have focused on the privacy mode being the step function described in (3). When the privacy mode is an arbitrary sequence, we can generalize the result of Theorem 1. So the rate tuple $\left(R_{t}:t\in\mathbb{Z}\right)$ is achievable if and only if

[TABLE]

where $F^{-}(t)=\max\{i:i\leq t,F_{i}=\text{ON}\}$ , i.e., $F^{-}(t)$ is the latest time the privacy was ON.

The intuition is the following. To protect all the past requests when privacy was ON, it suffices to protect the last request when privacy was ON, which is $F^{-}(t)$ . This follows mainly from the Markovity of the requests.

The proof of (24) when $F_{t}=\text{OFF}$ follows similar steps as the proof of Theorem 1. In particular, in the converse proof, by applying the chain rule to $\sum_{i=F^{-}(t)+1}^{t}\left(X_{\mathcal{B}_{i}};Q_{i}|Q_{(i-1)}\right)$ , we can easily obtain that

[TABLE]

Moreover, instead of inspecting the distribution $p\left(Q_{t},X_{t},X_{(F^{-}(t))}\right)$ for the step function, we can inspect the distribution $p\left(Q_{t},X_{t},X_{(F^{-}(t))},Q_{(F^{-}(t))}\right)$ here. Note that for any fixed $q_{(F^{-}(t))}$ , we have exactly the same proof as we did for the step function. Hence, we can obtain the same upper bound on the rate, i.e.,

[TABLE]

For the achievability proof when $F_{t}=\text{ON}$ , the user downloads the messages from both sources. When $F_{t}=\text{OFF}$ , the coding scheme is similar to before and can be obtained by replacing $X_{0}$ by $X_{F^{-}(t)}$ , that is

[TABLE]

Then, one can check that the obtained coding scheme satisfies the privacy constraint for any privacy mode $\{F_{t}:t\in\mathbb{Z}\}$ . Moreover, it achieves the rate in (24). The verification details are similar to those for the step function.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Sweeney, “K-anonymity: A model for protecting privacy,” International Journal on Uncertainty, Fuzziness and Knowledge-based Systems , vol. 10, no. 5, pp. 557–570, Oct. 2002.
2[2] C. Dwork, “Differential privacy,” in 33rd International Colloquium on Automata, Languages and Programming (ICALP) , 2006.
3[3] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in IEEE Symposium on Foundations of Computer Science , 1995.
4[4] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 55th Annual Allerton Conference on Communication, Control, and Computing , 2017.
5[5] N. Shah, K. Rashmi, and K. Ramchandran. “One extra bit of download ensures perfectly private information retrieval,” in IEEE International Symposium on Information Theory. (ISIT) , 2014.
6[6] R. Tajeddine and S. El Rouayheb, “Private information retrieval from mds coded data in distributed storage systems,” in IEEE International Symposium on Information Theory. (ISIT) , 2016.
7[7] R. Freij-Hollanti, O. W. Gnilke, C. Hollanti, and D. A. Karpuk, “Private information retrieval from coded databases with colluding servers,” in SIAM Journal on Applied Algebra and Geometry , vol. 1, no. 1, pp. 647-664, 2017.
8[8] H. Sun and S. Jafar, “The capacity of private information retrieval,” IEEE Transactions on Information Theory , vol. 63, no. 7, pp. 4075-4088, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

ON-OFF Privacy with Correlated Requests

Abstract

I Introduction

I-A Motivation

I-B Example

I-C Setup & Contributions

I-D Related Work

II Problem Formulation and notations

Definition 1**.**

III Main results

Theorem 1**.**

IV Achievability of Theorem 1

IV-A ON-OFF Privacy Scheme

IV-B Privacy

Claim 1**.**

Claim 2**.**

IV-C Rate

Lemma 1**.**

Acknowledgment

Appendix A Converse of Theorem 1

Appendix B Proof of Claim 2

Proposition 1**.**

Proposition 2**.**

Proposition 3**.**

Appendix C Proof of Lemma 1

Appendix D General Privacy Mode

Definition 1.

Theorem 1.

Claim 1.

Claim 2.

Lemma 1.

Proposition 1.

Proposition 2.

Proposition 3.