ON-OFF Privacy with Correlated Requests
Carolina Naim, Fangwei Ye, Salim El Rouayheb

TL;DR
This paper introduces the ON-OFF privacy problem, addressing how to maximize download rates while maintaining privacy when user requests are correlated, modeled as a Markov chain with two sources.
Contribution
The paper formulates the ON-OFF privacy problem, models correlated requests as a Markov chain, and proposes an optimal privacy scheme for two sources.
Findings
Proposed an ON-OFF privacy scheme for two sources.
Proved the scheme's optimality under Markov request modeling.
Addresses privacy leakage due to request correlation.
Abstract
We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of N online sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum download rate that allow the user to obtain privately his requested messages. In many realistic scenarios, the user's requests are correlated since they depend on his personal attributes such as age, gender, political views, or geographical location. Hence, even when privacy is OFF, he cannot simply reveal his request since this will leak information about his requests when privacy was ON. We study the case when the users's requests can be modeled by a Markov chain and N=2 sources. In this case, we propose an ON-OFF privacy scheme and prove its…
| Symbol | Definition |
|---|---|
| Number of sources | |
| Message generated by source of index at time | |
| User’s request at time () | |
| Privacy mode at time (ON or OFF) | |
| Query sent by the user to the server at time | |
| Answer sent by the server to the user at time | |
| Local randomness generated by the user at time | |
| Average length of the answer | |
| Rate at time | |
| for any integers and such that | |
| for any integer |
| 0 | 0 | 1 | 0 | 0 | |||||
| 0 | 0 | 0 | |||||||
| 0 | 0 | 0 | 0 | 0 | |||||
| 0 | 0 | 0 | 1 | 0 | |||||
| (a) | (b) and is even | (c) and is odd | |||||||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
ON-OFF Privacy with Correlated Requests
Carolina Naim, Fangwei Ye, Salim El Rouayheb
Department of Electrical and Computer Engineering, Rutgers University
Emails: {carolina.naim, fangwei.ye, salim.elrouayheb}@rutgers.edu
Abstract
We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of online sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum download rate that allow the user to obtain privately his requested messages. In many realistic scenarios, the user’s requests are correlated since they depend on his personal attributes such as age, gender, political views, or geographical location. Hence, even when privacy is OFF, he cannot simply reveal his request since this will leak information about his requests when privacy was ON. We study the case when the users’s requests can be modeled by a Markov chain and sources. In this case, we propose an ON-OFF privacy scheme and prove its optimality.
I Introduction
I-A Motivation
Privacy is a major concern for online users who can unknowingly reveal critical personal information (age, sex, diseases, political proclivity, etc.) through daily online activities such as watching online videos, following people and liking posts on social media, reading news and searching websites. This is now a well-acknowledged concern and has lead to many interesting theoretical problems such as anonymity [1], differential privacy [2], private information retrieval [3], and other privacy-preserving algorithms.
In all these formulations the user is assumed to always want to maintain a certain level of privacy, which we refer to as privacy being always ON. However, in many scenarios, the user may wish to switch between privacy being ON and OFF. This switch depends on several criteria such as location, network/connection or phone/machine being used, to name a few. The reason the user may want to flip between these two modes, instead of keeping privacy always ON, is that typically privacy-preserving solutions incur a degradation in the quality of service, mostly felt by the user through large delays. Service providers may also be interested in incentivizing the user to require privacy only when it is needed since private solutions also incur higher communication and computation costs on their side.
One may be tempted to propose the simple solution in which the user has available to him two schemes, one private and one non-private. Over time, the user simply switches between these two schemes depending on whether privacy is turned ON or OFF. The problem with this solution is that it guarantees privacy only if the user’s online activities are statistically independent over time. However, a user’s online activities are typically personal, making them correlated over time. For example, a bilingual English/Spanish user, who is checking the news in Spanish now, is more likely to keep reading the news in Spanish for a while before switching to English. At that point English becomes more probable. Another example is when the user is watching online videos. The user chooses the video to watch next from a list of videos recommended to him and this list depends on previously watched videos. Thus, due to correlation, simply ignoring the privacy requirement when privacy is OFF may reveal information about the activities when privacy was ON.
I-B Example
To be more concrete and to gently introduce our setup for ON-OFF privacy, we give the following example. Suppose a user is watching political or news videos online. At each time , the user has a choice between two new videos each of which is produced by two different news sources, and . Source is politically left-leaning and source is right-leaning.
Let be the source whose video the user wants to watch at time . We model the correlation among the user’s requests by assuming that is the two-state Markov chain depicted in Figure 1, where and . For example, we choose . This means that if the current video being watched is left-leaning, there is an chance that the next video is also left-leaning, and vice versa.
For the sake of brevity, we focus on the two time instants and , and assume that privacy is ON at and is switched to OFF at . This means that the user would like to hide whether he was watching a left-leaning or a right-leaning video at time , but does not care about revealing the source of the video he watched at .
The goal is to devise an ON-OFF privacy scheme that always gives the user the video he wants, but never reveals the choice of sources when privacy is ON, i.e., in this case. We are interested in schemes that minimize the download cost, or equivalently maximize the download rate (the inverse of the normalized download cost).
At , the problem is simple. The user achieves privacy by downloading both videos. We say that the user’s query at is . Therefore, the download rate at is .
At , the privacy is OFF. Now, the user must be careful not to directly declare his request, because this may reveal information about his request at which is to remain private. The user can again download both videos, i.e., , and achieve privacy with a rate .
Our key result is that the user can achieve a better rate at , without compromising privacy, by
- •
choosing randomly between downloading () or both and () if he wants ,
- •
choosing randomly between downloading () or both and () if he wants .
This random choice must also depend on the request at . The different probabilities defining the scheme are given in Table I and will be justified later when we explain the general scheme. For now, one can check that these probabilities lead to
[TABLE]
for any and any . Thus, and are independent and the proposed scheme in Table I achieves perfect privacy for the request at . Moreover, the scheme ensures that the user always obtains the video he is requesting.
For , the rate , which is strictly greater than , the rate of querying both files. We later show that this rate is actually optimal. In fact, the values in Table I were carefully chosen to achieve the privacy at the highest download rate. Any other choice of the probabilities would either violate privacy or lose the optimality of the rate.
I-C Setup & Contributions
We introduce a mathematical model to capture the ON-OFF privacy problem when the user is downloading data from online sources.
We consider the setup in which there are information sources each producing a new message at each time . At each time , the user randomly chooses one of the sources and requests its latest produced message.
The privacy constraint is the following: the user wants to leak zero information about the identity of the sources in which he is interested at each time when the privacy is ON. The main challenge stems from the fact that the user’s requests are not independent. As in the previous example, we model the dependence between these requests by an - state Markov chain. The goal is to design an ON-OFF privacy scheme with maximum download rate that satisfies the user’s request and guarantees the privacy of the requests made when privacy is ON.
Our technical results can be summarized as follows. We study the case of sources for the special but important case where privacy is ON for and switched OFF for . We prove an upper bound on the instantaneous download rate at each time , and give an ON-OFF privacy scheme that achieves it.
I-D Related Work
The special case of the ON-OFF privacy problem in which privacy is always ON and the user’s requests are independent reduces to the information-theoretic private information retrieval (PIR) problem on a single server. In this case, the user cannot do anything smarter than downloading everything [3] (except the recently studied problem when the user has side information [4] which is not the case here). Recently, there has been significant research activity on determining the maximum download rate of PIR with multiple servers (e.g. [5, 8, 6, 7, 9]). However, the model there requires multiple servers and, in the parlance of this paper, privacy is assumed to be always ON.
II Problem Formulation and notations
The ON-OFF privacy model can be described as follows. A single server stores sources indexed by . Each source generates a message at time , where . We only consider a discrete time throughout this paper, i.e., . For any integers and such that , denote by , and by .
A user retrieves messages consecutively from the server. He is interested in one of the sources at each time, and wishes to retrieve the latest message generated by the desired source.
In particular, let be the index of the desired source at time , which takes values in , and in the sequel we call the user’s request. By slightly abusing the notation, we denote the latest message generated by the desired source as , and the user wishes to retrieve the message . We assume that the messages are mutually independent, each of which consists of symbols. Without loss of generality, we assume that each of the messages is uniformly distributed on , i.e.,
[TABLE]
and
[TABLE]
As discussed in Section I, we are particularly interested in the case where the requests , for , form a Markov chain. The transition matrix of the Markov chain is known to both the server and the user.
Meanwhile, the user may or may not wish to keep the identity of the source he is interested in at time , hidden from the server. Specifically, the privacy mode at time can be either ON or OFF, where is ON when the user wishes to keep private, while is OFF when the user is not concerned with privacy.
In this paper, we focus on the case in which the privacy mode is the step function given by .
[TABLE]
Solving the problem for this step function is an essential building block for tackling the general case where is a random process. A discussion about the general case can be found in Appendix D.
The user is allowed to generate unlimited local randomness and we are not interested in the amount of randomness used. Therefore, we assume without loss of generality that the random variables , representing the local randomness, are mutually independent. Moreover, we assume that the user’s requests , the messages and the local randomness are mutually independent.
As discussed in Section I, if the user carelessly downloads the desired message at time when the privacy is OFF, the privacy in the past may be compromised. To ensure privacy, the user may utilize the requests and the local randomness to construct a query and send it to the server. Upon receiving the query, the server responds to the request by producing the answer consisting of symbols, where the length of is a function of the query . Thus, the average length of the answer is given by
[TABLE]
The query at time is assumed to be a function of all the requests and all the local randomness up to and including time , i.e.,
[TABLE]
Note that since the previous answers are functions of the previous messages, which are independent on the current message, the previous answers will not help in retrieving the current message, so without loss of generality, is not encoded from .
Correspondingly, the answer of the server is a function of the query and the messages , i.e.,
[TABLE]
These functions need to satisfy the decodability and the privacy constraints, i.e.,
Decodability: For any time , the user should be able to recover the desired message from the answer with zero-error probability, i.e.,
[TABLE] 2. 2.
Privacy: For any time , given all past queries received by the server, the query should not reveal any information about all the past or present requests where the privacy is ON, that is
[TABLE]
where .
For any message length , the tuple is said to be achievable if there exists a code satisfying the decodability and the privacy constraint. The efficiency of the code can be measured by the download rate . Hence, we define the achievable region as follows:
Definition 1**.**
The rate tuple is achievable if there exists a code with message length and average download cost such that .
Conventionally, the capacity region can be defined as the closure of the set of achievable rate tuples , where is the set of all possible probability distributions of . Table II summarizes our notation.
III Main results
Our main result is a complete characterization of the achievable region for the case of two sources, i.e., . We will use and to denote these two sources. In this case, the requests follow a two state Markov chain defined by the transition matrix
[TABLE]
where is the transition probability from to , and is the transition probability from to .
We first state the main theorem of this paper.
Theorem 1**.**
For privacy mode given in (3), the rate tuple is achievable if and only if
[TABLE]
When privacy is ON, for , the user has to request both the most recent messages of and . Therefore, the rate .
The more interesting part of Theorem 1 is for . For a fixed time , the rate as a function of and is symmetric around . When , the user’s requests are independent such that , so the user can directly query for his desired message, i.e., . The rate is then maximized to .
In terms of asymptotics, when the Markov chain is ergodic, the download rate goes to as goes to infinity. Intuitively, as grows, the information carried by about decreases, so the user can eventually directly query for what he wants, i.e., . Otherwise, when the Markov chain is not ergodic ( or ), not much can be done and the rate is constant at . The user has to query for both messages of and at every time , i.e., for all .
Figure 2 shows the rate as a function of time for different values of . As approaches , the correlation between the request decreases leading to an increase in the rate.
In the following section, we give the scheme that achieves the rate tuples given in Theorem 1. We prove the converse in Appendix A.
IV Achievability of Theorem 1
IV-A ON-OFF Privacy Scheme
In this section, we will describe an ON-OFF privacy scheme that achieves the rate in Theorem 1, by specifying its encoding functions defined in Section II.
Our coding scheme retrieves of the messages in uncoded form. More specifically, the alphabet for the queries is . The query values and denote respectively the user requesting the latest message of source , or both. Upon receiving , the server responds by sending either one or two messages, such that
[TABLE]
The length of the answer is given by
[TABLE]
The normalized average length is
[TABLE]
It remains to specify the query encoding functions . The query encoding function at time is described as follows:
- •
For , we simply download two messages to guarantee privacy, i.e., . This is an immediate result in information-theoretic single-server private information retrieval [3].
- •
For , the query is a function of , , and the local randomness , i.e.,
[TABLE]
Since we are not interested in the local randomness used, instead of specifying the function explicitly, we regard as a probabilistic function of , and the distribution is as follows:
Given , , and ,
if , then with probability . 2. 2.
if , then is as given in Table III.
IV-B Privacy
In this subsection, we prove that the given scheme satisfies the privacy constraint for . Recall the privacy constraint (8) that , where . We want to show that
[TABLE]
To do that we will show that each of the terms in the sum in (12) is equal to zero.
Claim 1**.**
.
The claim can be justified as follows:
[TABLE]
where (a) follows because is a function of , and (b) follows from the independence between and , and the Markovity of .
Claim 2**.**
* for .*
The proof of Claim 2 can be found in Appendix B.
IV-C Rate
Now, we evaluate the rate achieved by this coding scheme. We know from (11) that
[TABLE]
is achievable. For , since , we know that is achievable. To complete the computation of the rate, for , we need the following result in Lemma 1 whose proof can be found in Appendix C.
Lemma 1**.**
The random variables form a Markov chain with transition matrix , where
[TABLE]
and
[TABLE]
From Lemma 1, we easily obtain that
[TABLE]
where (a) follows because
[TABLE]
for the transition matrices given in both (13) and (14); and (b) follows from , which can be justified because the user is required to download both messages at since .
[TABLE]
Therefore, we can conclude that
[TABLE]
is achievable for .
Acknowledgment
This work was supported by NSF Grant CCF 1817635.
Appendix A Converse of Theorem 1
In this section, we will prove the converse. For , we know from [3] that it is necessary to download two messages to achieve perfect privacy. For , we will show that for any given satisfying the decodable condition and the privacy constraint, the rate is upper bounded by
[TABLE]
or equivalently
[TABLE]
Since
[TABLE]
we consider partitioning the alphabet into three disjoint subsets , and based on the decodability of , or . Roughly speaking, can decode correctly but cannot decode . Similarly, can decode correctly but cannot decode , and can decode both and correctly. Clearly, , and . Hence, we have
[TABLE]
Recall the privacy constraint for ,
[TABLE]
where . Since (16) holds for any , for a fixed , we have
[TABLE]
From , we can easily have
[TABLE]
and (17) can be written as
[TABLE]
Now, we focus on the marginal distribution . For notational simplicity, let and . Here, , , and , , are defined similarly. Also, let and .
By referring to the decodability and (18), we know that any adimissible can be illustrated by Table IV.
By examining the values in the table, we have
[TABLE]
Hence, we obtain that
[TABLE]
From the Markovity of , we have
[TABLE]
Therefore, we finally obtain that
[TABLE]
which completes the converse proof.
Appendix B Proof of Claim 2
We first introduce three propositions. They show the dependency relations between random variables induced by the given coding scheme. The propositions are straightforward so the proofs are omitted.
Proposition 1**.**
For , is a deterministic function of and , i.e., .
Proposition 2**.**
For , forms a Markov chain. In particular, any subset of is independent of given .
Proposition 3**.**
For , forms a Markov chain. In particular, any subset of is independent of given .
Claim 2 is equivalent to
[TABLE]
for any and . Therefore consider,
[TABLE]
[TABLE]
where (a) follows from Proposition 3, (b) follows from Proposition 1, and (c) follows from Proposition 2 and the Markovity of .
If or , we have
[TABLE]
where (a) follows from the fact that if or then , and (b) follows because given or , with probability .
Clearly, R.H.S of (19) is independent of the choice of , and thus it remains to show that
[TABLE]
for any . Towards this end, let us discuss separately as follows:
- •
When , we have
[TABLE]
where (a) follows Table III(a) where given .
Substituting by and in (20) on the L.H.S and R.H.S. respectively, we can verify from Table III(a) and transition matrix that
[TABLE]
for all .
- •
When and is odd, is even, and from Table III(b), only if . Therefore, (21) still holds, and we can verify (22) from Table III(c) and the transition matrix .
- •
When and is even, is odd, and from Table III(c), only if , and we have
[TABLE]
for all . Similarly, we can verify (23) from Table III(b) and the transition matrix .
Appendix C Proof of Lemma 1
First, or only if ; therefore it is easy to see the following,
[TABLE]
Then, we consider
[TABLE]
where (a) follows from Preposition 1 where is a function of given , (b) follows from the privacy at time . The second term in (c) follows from being a function of and and the Markovity of , and the third term follows from Proposition 1.
Now we substitute by , and and discuss two cases and .
- •
For , . Then,
[TABLE]
For instance, let and , and using the values given in Table III(a) and transition matrix , we can verify that
[TABLE]
Similarly, we can verify the rest of values given in transition matrix for .
- •
For ,
if is odd, then is even, and
[TABLE]
if is even, then is odd, and
[TABLE]
We can verify the remaining elements of the transition matrix , for , using the values in transition matrix , and the values in Table III(c) and (b), for odd and even respectively.
Appendix D General Privacy Mode
So far, we have focused on the privacy mode being the step function described in (3). When the privacy mode is an arbitrary sequence, we can generalize the result of Theorem 1. So the rate tuple is achievable if and only if
[TABLE]
where , i.e., is the latest time the privacy was ON.
The intuition is the following. To protect all the past requests when privacy was ON, it suffices to protect the last request when privacy was ON, which is . This follows mainly from the Markovity of the requests.
The proof of (24) when follows similar steps as the proof of Theorem 1. In particular, in the converse proof, by applying the chain rule to , we can easily obtain that
[TABLE]
Moreover, instead of inspecting the distribution for the step function, we can inspect the distribution here. Note that for any fixed , we have exactly the same proof as we did for the step function. Hence, we can obtain the same upper bound on the rate, i.e.,
[TABLE]
For the achievability proof when , the user downloads the messages from both sources. When , the coding scheme is similar to before and can be obtained by replacing by , that is
[TABLE]
Then, one can check that the obtained coding scheme satisfies the privacy constraint for any privacy mode . Moreover, it achieves the rate in (24). The verification details are similar to those for the step function.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Sweeney, “K-anonymity: A model for protecting privacy,” International Journal on Uncertainty, Fuzziness and Knowledge-based Systems , vol. 10, no. 5, pp. 557–570, Oct. 2002.
- 2[2] C. Dwork, “Differential privacy,” in 33rd International Colloquium on Automata, Languages and Programming (ICALP) , 2006.
- 3[3] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in IEEE Symposium on Foundations of Computer Science , 1995.
- 4[4] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson, “Private information retrieval with side information: The single server case,” in 55th Annual Allerton Conference on Communication, Control, and Computing , 2017.
- 5[5] N. Shah, K. Rashmi, and K. Ramchandran. “One extra bit of download ensures perfectly private information retrieval,” in IEEE International Symposium on Information Theory. (ISIT) , 2014.
- 6[6] R. Tajeddine and S. El Rouayheb, “Private information retrieval from mds coded data in distributed storage systems,” in IEEE International Symposium on Information Theory. (ISIT) , 2016.
- 7[7] R. Freij-Hollanti, O. W. Gnilke, C. Hollanti, and D. A. Karpuk, “Private information retrieval from coded databases with colluding servers,” in SIAM Journal on Applied Algebra and Geometry , vol. 1, no. 1, pp. 647-664, 2017.
- 8[8] H. Sun and S. Jafar, “The capacity of private information retrieval,” IEEE Transactions on Information Theory , vol. 63, no. 7, pp. 4075-4088, 2017.
