Exploring Differential Obliviousness
Amos Beimel, Kobbi Nissim, Mohammad Zaheri

TL;DR
This paper investigates the concept of differential obliviousness, a relaxed privacy notion for algorithms, demonstrating its potential benefits in property testing and tasks with input-dependent exploration, beyond traditional oblivious algorithms.
Contribution
It extends the understanding of differential obliviousness by analyzing its advantages in property testing and input-dependent tasks, highlighting scenarios where it outperforms full obliviousness.
Findings
Differential obliviousness offers nearly linear overhead improvements in dense graph property testing.
Quadratic overhead improvements are possible in bounded degree graph models.
Differential obliviousness can maintain input-dependent exploration behaviors, unlike full obliviousness.
Abstract
In a recent paper Chan et al. [SODA '19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM '96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the similarity requirement is that of differential privacy. Chan et al. demonstrated that differential obliviousness allows achieving improved efficiency for several algorithmic tasks, including sorting, merging of sorted lists, and range query data structures. In this work, we continue the exploration and mapping of differential obliviousness, focusing on algorithms that do not necessarily examine all their input. This choice is motivated by the fact that the existence of logarithmic overhead ORAM protocols implies that differential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Exploring Differential Obliviousness††thanks: Work supported by NSF grant No. 1565387 TWC: Large: Collaborative: Computing Over Distributed Sensitive Data.
Amos Beimel Dept. of Computer Science, Ben-Gurion University, Israel. [email protected]. Work done while A.B. was visiting Georgetown University.
Kobbi Nissim Dept. of Computer Science, Georgetown University. [email protected].
Mohammad Zaheri Dept. of Computer Science, Georgetown University. [email protected].
In a recent paper, Chan et al. [SODA ’19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM ’96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the similarity requirement is that of differential privacy. Chan et al. demonstrated that differential obliviousness allows achieving improved efficiency for several algorithmic tasks, including sorting, merging of sorted lists, and range query data structures.
In this work, we continue the exploration of differential obliviousness, focusing on algorithms that do not necessarily examine all their input. This choice is motivated by the fact that the existence of logarithmic overhead ORAM protocols implies that differential obliviousness can yield at most a logarithmic improvement in efficiency for computations that need to examine all their input. In particular, we explore property testing, where we show that differential obliviousness yields an almost linear improvement in overhead in the dense graph model, and at most quadratic improvement in the bounded degree model. We also explore tasks where a non-oblivious algorithm would need to explore different portions of the input, where the latter would depend on the input itself, and where we show that such a behavior can be maintained under differential obliviousness, but not under full obliviousness. Our examples suggest that there would be benefits in further exploring which class of computational tasks are amenable to differential obliviousness.
1 Introduction
A program’s memory access pattern can leak significant information about the private information used by the program even if the memory content is encrypted. Such leakage can turn into a data protection problem in various settings. In particular, where data is outsourced to be stored on an external server, it has been shown that access pattern leakage can be exploited in practical attacks and lead to the compromise of the underlying data [20, 4, 29, 21, 23]. Such leakages can also be exploited when a program is executed in a secure enclave environment but needs to access memory that is external to the enclave.
Memory access pattern leakage can be avoided by employing a strategy that makes the sequence of memory accesses (computationally or statistically) independent of the content being processed. Beginning with the seminal work of Goldreich and Ostrovsky, it is well known how to transform any program running on a random access memory (RAM) machine to one with an oblivious memory access pattern while retaining efficiency by using an Oblivious RAM protocol (ORAM) [10, 30, 13]. Current state-of-the-art ORAM protocols achieve logarithmic overhead [2], matching a recent lowerbound by Larsen and Nielsen [24], and protocols with overhead exist when the server is allowed to perform computation and large blocks are retrieved [6, 28]. To further reduce the overhead, oblivious memory access pattern protocols have been devised for specific tasks, including graph algorithms [3, 17], geometric algorithms [8] and sorting [16, 25]. The latter is motivated by sorting being a fundamental and well researched computational task as well as its ubiquity in data processing.
1.1 Differential Obliviousness
Full obliviousness is rather a strong requirement: any two possible inputs (of the same size) should exhibit identical or indistinguishable sequences of memory accesses. Achieving full obliviousness via a generic use of ORAM protocols requires a setup phase with running time (at least) linear in the memory size and then a logarithmic overhead per each memory access.
A recent work by Chan, Chung, Maggs, and Shi [5] put forward a relaxation of the obliviousness requirement where indistinguishability is replaced with differential privacy. Intuitively, this means that any two possible neighboring inputs should exhibit memory access patters that are similar enough to satisfy differential privacy, but may still be too dissimilar to be “cryptographically” indistinguishable. It is not a priori clear whether differential obliviousness can be achieved without resorting to full obliviousness. However, the recent work Chan et al. showed that differential obliviousness does allow achieving improved efficiency for several algorithmic tasks, including sorting (over very small domains), merging of sorted lists, and range query data structures.
Also of relevance are the works by He et al. [19] and Mazloom and Gordon [27], which study protocols for secure multiparty computation in which the parties are allowed to learn information from the computation as long as this information preserves the differential privacy of the input. He et al. and Mazloom and Gordon demonstrate that this leakage is useful: He et al. construct protocols for the private record linkage problem for two databases; Mazloom and Gordon present protocols for histograms, PageRank, and matrix factorization.
Furthermore, even the use of ORAM protocols may be insufficient for preventing leakage in cases where the number of memory probes is input dependent. In fact, Kellaris et al. [21] show that such leakage can result in a complete reconstruction in the case of retrieving elements specified by range queries, as the number of records returned depends on the contents of the data structure. Full obliviousness would require that the sequence of memory accesses would be padded to a maximal one to avoid such leakage, a solution that would have a dire effect on the efficiency of many algorithms. Differential obliviousness may in some cases allow achieving meaningful privacy while maintaining efficiency. Examples of such protocols include the combination of ORAM with differentially private sanitization by Kellaris et al. [22] and the recent work of Chan et al. [5] on range query data structures, which avoids using ORAM.
1.2 This Work: Exploring Differential Obliviousness
Noting that the existence of logarithmic overhead ORAM protocols implies that differential obliviousness can yield at most a logarithmic improvement in efficiency for computations that need to examine all their input, we explore tasks where this is not the case. In particular, we focus on property testing and on tasks where the number of memory accesses can depend on the input.
Property testing.
As evidence that differential obliviousness can provide a significant improvement over full obliviousness, we show in Section 3 that property testers in the dense graph model, where the input is in the adjacency matrix representation [12], can be made differentially oblivious. This result captures a large set of testable graph properties [12, 1] including, e.g., graph bipartitness and having a large clique. Testers in this class probe a uniformly random subgraph and hence are fully oblivious without any modification, as their access pattern does not depend on the input graph. However, this is not the case if the tester reveals its output to the adversary, as this allows learning information about the specific probed subgraph. A fully oblivious tester would need to access a linear-sized subgraph, whereas we show that a differentially oblivious tester only needs to apply the original tester times.111We omit dependencies on privacy and accuracy parameters from this introductory description.
We also consider property testing in the bounded degree model, where the input is in the incidence lists model [14]. In this model we provide negative results, demonstrating that adaptive testers cannot, generally, be made differentially oblivious without a significant loss in efficiency. In particular, in Section 4 we consider differentially oblivious property testers for connectivity in graphs of degree at most two. For non-oblivious testers, it is known that constant number of probes suffice when the tester is adaptive [14].222In an adaptive tester at least one choice of a node to probe should depend on information gathered from incidence lists of previously probed nodes. It is also known that any non-adaptive tester for this task requires probing nodes [32]. We show that this lowerbound extends to differentially oblivious testers, i.e., any differentially oblivious tester for connectivity in graphs of maximal degree requires probes. While this still improves over full obliviousness, the gap between full and differential obliviousness is in this case diminished.
Locating an Object Satisfying a Property.
Here, our goal is to check whether a given data set of objects includes an object that satisfies a specified property. Without obliviousness requirements, a natural approach is to probe elements in a random order until an element satisfying the property is found or all elements were probed. If a fraction of the elements satisfy the property, then the expected number of probes is . This algorithm is in fact instance optimal when the data set is randomly permuted.333Our treatment of instance optimality is rather informal. The concept was originally presented in [9].
A fully oblivious algorithm would require probes on any dataset even when . In contrast, we demonstrate in Section 5 that with differential obliviousness instance optimality can, to a large extent, be preserved. Our differentially oblivious algorithm always returns a correct answer and makes at most probes with probability at least .
Prefix Sum.
Our last example considers a sorted dataset (possibly, the result of an earlier phase in the computation). Our goal is to compute the sum of all records in the (sorted) dataset that are less than or equal to a given value (see Section 6 for the definition of privacy).
Without obliviousness requirements, one can find the greatest record less than or equal to value , say, using binary search, and then compute the prefix sum by a quick scan through all records appearing before this record. This algorithm is in fact nearly instance optimal, as it can be shown that any algorithm which returns the correct exact answer with non-negligible probability must probe all entries greater than . However, fully oblivious algorithms would have to probe the entire dataset.
In Section 6, we give our nearly instance optimal differentially oblivious prefix sum algorithm. As the probes of a binary search would leak information about the memory content, we introduce a differentially oblivious “simulation” of the binary search. Our differentially oblivious binary search runs in time .
We also address the scenario where there are multiple prefix sum queries to the same database. If the number of queries is bounded by some integer , then each differentially oblivious binary search will run in time (as we need to run the search algorithm with a smaller privacy parameter ). Using ORAM, one can answer such queries with prepossessing time and time per query. Combining our algorithm and ORAM, we can amortize the pre-processing time over queries, that is, without any pre-processing, the running of time of answering the -th query is for the first queries and for any further query.
1.3 Background Work
The papers by Chan, Chung, Maggs, and Shi [5], He, Machanavajjhala, Flynn, and Srivastava [19], and by Mazloom and Gordon [27] mentioned above are most relevant for this article. As mentioned above, Kellaris et al. [22] examined a similar concept with the goal of preventing reconstruction attacks in secure remote databases. Goldreich, Goldwasser, and Ron [12] initiated the research on graph property testing. Persiano and Yeo [31] showed that the lowerbound for ORAM of [24] also holds when the security requirement is relaxed to differetial privacy. Goldreich’s book on property testing [11] gives sufficient background for our discussion. Dwork, McSherry, Nissim, and Smith [7] defined differential privacy. For more details on ORAM and a list of relevant papers, the reader can consult [2].
2 Definitions
2.1 Model of Computation
We consider the standard Random Access Memory (RAM) model of computation that consists of a CPU and a memory. The CPU executes a program and is allowed to perform two types of memory operations: read a value from a specified physical address, and write a value to a specified physical address. We assume that the CPU has a private cache of where it can store values (and/or a polylogarithmic number of bits). As an example, in the setting of a client storing its data on the cloud, the client plays the role of the CPU and the cloud server plays the role of the memory.
We assume that a program’s sequence of read and write operations may be visible to an adversary. We will call this sequence the program’s access pattern. We will further assume that the memory content is encrypted so that no other information is leaked about the content read from and stored in memory location. The program’s access pattern may depend on the program’s input, and may hence leak information about it.
2.2 Oblivious Algorithms
There are various works focused on oblivious algorithms [8, 15, 26] and Oblivious RAM (ORAM) constructions [13]. These works adopt “full obliviousness” as a privacy notion. Suppose that is an algorithm that takes in two inputs, a security parameter and an input dataset denoted . We denote by , the ordered sequence of memory accesses the algorithm makes on the input and .
Definition 2.1** (Fully Oblivious Algorithms).**
Let be a function in a security parameter . We say that algorithm is -statistically oblivious, iff for all inputs and of equal length, and for all , it holds that where denotes that the two distributions have at most statistical distance. We say that is perfectly oblivious when .
2.3 Differentially Oblivious Algorithms
Suppose that is an (stateful) algorithm that takes in three inputs, a security parameter , an input dataset denoted by and a value . We slightly change the definition of differentially oblivious algorithms given in [5]:
Definition 2.2** (Neighbor-respecting).**
We say that two input datasets and are neighboring iff they are of the same length and differ in exactly one entry. We say that is neighbor-respecting adversary iff for every and every , outputs neighboring datasets , with probability 1.
Definition 2.3**.**
Let be privacy parameters. Let be an (possibly stateful) algorithm described as above. To an adversary we associate the experiment in Figure 1, for every . We say that is -adaptively differentially oblivious if for all (computationally unbounded) stateful neighbor-respecting adversary we have
[TABLE]
In Figure 1, denotes the ordered sequence of memory accesses the algorithm makes on the inputs and .
Remark 2.4**.**
The notion of adaptivity here is different from the one defined in [5]. We require that the dataset remain the same through the experiment whereas in [5] the adaptive adversary can add or remove entries from the dataset.
As with differential privacy, we usually think about as a small constant and require that where [7]. Observe that if is -statistically oblivious then it is also -differentially oblivious.
The following simple lemma will be useful to analyze our algorithms. The proof of the lemma appears in Appendix A.
Lemma 2.5**.**
Let be an -differentially oblivious algorithm and be an algorithm such that for every dataset the statistical distance between and is at most (that is, for every ). Then, is an -differentially oblivious algorithm.
3 Differentially Oblivious Property Testing of Dense Graphs Properties
In this section, we present a differentially oblivious property tester for dense graphs properties in the adjacency matrix representation model. A property tester is an algorithm that decides whether a given object has a predetermined property or is far from any object having this property by examining a small random sample of its input. The correctness requirement of property testers ignores objects that neither have the property nor are far from having the property. However, the privacy requirement is “worst case” and should hold for any two neighboring graphs. For the definition of privacy we say that two graphs of size are neighbors if one can get by changing the neighbors of exactly one node of .
Property testing of graph properties in the adjacency matrix representation was introduced in [12]. A graph is represented by the predicate such that if and only if and are adjacent in . The distance between graphs is defined to be the number of different matrix entries over . This model is most suitable for dense graphs where the number of edges is . We define a property of graphs to be a subset of the graphs. We write to show that graph has the property . For example, we can define the bipartiteness property, where is the set of all bipartite graphs.444 Recall that an undirected graph is bipartite (or 2-colorable) if its vertices can be partitioned into two parts, and , such that each part is an independent set (i.e., ).
We say that an -vertex is -far from if for every -vertex graph it holds that the symmetric difference between and is greater than . We define the property testing in this model as follows:
Definition 3.1** ([12]).**
A -tester for a graph property is a probabilistic algorithm that, on inputs , and an adjacency matrix of an -vertex graph :
Outputs 1 with probability at least , if . 2. 2.
Outputs 0 with probability at least , if is -far from .
We say a tester has one-sided error, if it accepts every graph in with probability 1. We say a tester is non-adaptive if it determines all its queries to adjacency matrix only based on , and its randomness; otherwise, we say it is adaptive.
Example 3.2** ([12]).**
Consider the following -tester for bipartiteness: Choose a random subset of size with uniform distribution and output 1 iff the graph induced by is bipartite. Clearly, if is bipartite, then the tester will always return 1. Goldreich et al. [12] proved that if is -far from a bipartite graph, then the probability that the algorithm returns 1 is at most .
Recall that in the graph property testing, the tester chooses a random subset of the graph with uniform distribution to test the property . Given the access pattern of the tester , an adversary will learn nothing since it is uniformly random. Thus, the access pattern by itself does not reveal any information about the input graph. However, we assume that the adversary also learns the tester’s output and can hence learn some information about the input graph based on the output of the tester. To protect this information, we run tester for constant number of times and output iff the number of times outputs exceed a (randomly chosen) threshold.
Let be a -tester for a graph property where . We write for the number of nodes that samples. Note that is constant in the graph size and a function of and . For simplicity, we only consider property testers with one-sided error. In Figure 2, we describe a -tester that outputs with probability at least if and outputs 0 with probability at least , if is -far from , where is defined below.
Theorem 3.3**.**
Let and . Algorithm is an -differentially oblivious algorithm that outputs 1 with probability 1 if , and output 0 with probability at least if is -far from .
The proof of Theorem 3.3 appears in Section A.2.
4 Lower Bounds on Testing Connectivity
in the Incidence Lists Model
We now consider differentially oblivious testing of connectivity in the incidence lists model [14]. In this model a graph has a bounded degree and is represented as a function , where is the -th neighbor of (if no such neighbor exists, then ). In this model, the relative distance between graphs is normalized by – the maximal number of edges in the graph. Formally, for two graphs with vertices,
[TABLE]
A -tester in the incidence lists model is defined as in Definition 3.1, where a property is a set of graphs whose maximal degree is and the distance to a property is defined with respect to .
Goldreich and Ron [14] showed how to test if a graph is connected in the incidence list model in time . Raskhodnikova and Smith [32] showed that a tester for connectivity (or any non-trivial property) with run-time has to be adaptive, that is, the nodes that the algorithm probes should depend on the neighbors of nodes the algorithm has already probed (e.g., the algorithm probes some node , discovers that is a neighbor and , and probes ). We strengthen their results by showing that any tester for connectivity in graphs of maximal degree and run-time cannot be a differentially oblivious algorithm. We stress that adaptivity alone is not a reason for inefficiency with differential obliviousness. In fact, there exist differentially oblivious algorithms that are adaptive (e.g., our algorithm in Section 6).
Theorem 4.1**.**
Let such that . Every -differentially private -tester for connectivity in graphs with maximal degree 2 runs in time .
Proof.
Let Tester be a -tester for connectivity in graphs of degree at most . We somewhat relax the definition of probes and assume that once the tester probes a node, it sees all edges adjacent to this node. We prove that if Tester probes less than nodes (for some constant ), then it is not -oblivious.
Assume that . Let be a cycle of length and consist of disjoint triangles. Clearly, is connected and is -far from a connected graph. For a permutation , define , where , and let be a random graph isomorphic to , that is, for a permutation chosen with uniform distribution.555 When we permute a graph, we also permute its incident list representation, i.e., if , then with probability half will be the first neighbor of and with probability half it will be the second.
On the random graph Tester has to say “yes” with probability at least 3/4 and on the random graph Tester has to say “no” with probability at least 3/4.
Observation 4.2**.**
If Tester does not probe two distinct nodes whose distance is at most two, then Tester sees a collection of paths of length two and cannot know if the graph is or .
Claim 4.3**.**
Given the random graph , the tester has to probe two distinct nodes whose distance is at most 2 with probability at least .
Proof.
Consider Tester’s answer when it sees a collection of paths of length . Assume first that the tester returns “No” with probability at least half in this case and let be the probability that Tester probes two distinct nodes whose distance is at most two on the random graph . The probability that Tester returns “Yes” on is at most . Thus, , i.e., .
If the tester returns “Yes” with probability at least half, then, by symmetric arguments, with probability at least Tester has to probe two nodes whose distance is at most two on the random graph . For a permutation , if the distance between two nodes in is at most 2, then the distance between these two nodes in is at most 2. Thus, by Observation 4.2,
[TABLE]
∎
Denote the nodes of by and define a distribution on pairs of graphs , obtained by the following process:
- •
Choose a permutation with uniform distribution and let .
- •
Denote and for .
- •
Choose with uniform distribution two indices such that (where the addition is done modulo ).
- •
Let , where
The graphs are described in Figure 3. Note that is also a a random graph isomorphic to , thus, given one cannot know which pair of non-adjacent nodes was used to create .
Observe that and differ on nodes. Since Tester is -differentially oblivious, for every algorithm ,
[TABLE]
Consider the following algorithm :
If and at least one of is probed by prior to seeing any other pair of nodes of distance at most in or , then return otherwise return [math].
Claim 4.4**.**
Let . Suppose that Tester probes at most nodes. Pick at random with uniform distribution two nodes in with distance at least in . The probability that probes both and prior to seeing any two nodes of distance at most in is (where the probability is over the random choice of and the randomness of Tester).
Proof.
The node is a uniformly distributed node in and is any node of distance at least from , thus there are options for . Given a collection of paths of length at most in all options are equally likely.
Let be the nodes probed in some execution of Tester. Fix some pair of indices . The probability that is at most . Thus, the probability that and are probed is at most ∎
Claim 4.5**.**
Assume that Tester probes at most nodes. The probability that is at least .
Proof.
By Claim 4.3, the probability that Tester probes at least one pair of nodes with distance at most is at least . Given that this event occurs, the probability that the random (chosen with uniform distribution) has the smallest index in the first such pair in (i.e., the first pair is either or ) is at least .
Clearly, given these events no two nodes with distance at most in were probed prior to probing the pair containing . Furthermore, there are pairs of nodes that are of distance at most in and are of distance greater than in . By Claim 4.4, the probability that such pair is probed prior to Tester probing a pair of distance at most in is . ∎
Claim 4.6**.**
Suppose that Tester probes at most nodes. The probability that is .
Proof.
The node is a uniformly distributed node in . Furthermore, the nodes is a uniformly distributed node of distance at least from in , thus by Claim 4.4, the probability that Tester probes both and prior to seeing any pair of distance at least in is . This probability can only decrease if we require that Tester probes both and prior to seeing any pair of distance at least in and in .
By the same arguments, the probability that Tester probes both and prior to seeing any pair of distance at least in and in is . ∎
To conclude the proof of Theorem 4.1, we note that by (1) and Claims 4.5 and 4.6
[TABLE]
Since , it follows that . ∎
5 Differentially Oblivious Algorithm for Locating an Object
Given a dataset of objects our goal is to locate an object that satisfies a property , if one exists. E.g., given a dataset consisting of employee records, find an employee with income in the range \35,000$70,000$ if such an employee exists in the dataset.
Absent privacy requirements, a simple approach is to probe elements of the dataset in a random order until an element satisfying the property is found or all elements were probed. If a fraction of the dataset entries satisfy then the expected number of elements sampled by the non-private algorithm is . However, a perfectly oblivious algorithm would require probes on any dataset, in particular on a dataset where all elements satisfy , where non-privately one probe would suffice. To see why, let if and otherwise and let include exactly one 1-entry in a uniformly random location. Observe that in expectation it requires memory probes to locate the 1-entry in . Perfect obliviousness would hence imply an probes on any input.
We give a nearly instance optimal differentially oblivious algorithm that always returns a correct answer. Except for probability the algorithm halts after steps.
Our Algorithm.
Given the access pattern of the non-private algorithm, an adversary can learn that the last probed element satisfies . To hide this information, we change the stopping condition to having probed at least a (randomly chosen) threshold of elements satisfying . If after probes the number of elements satisfying is below the threshold the entire dataset is scanned. Our algorithm is described in Figure 4. On a given array , algorithm outputs 1 iff there exists an element in satisfying the property .
We remark that Algorithm uses a mechanism similar to the the sparse vector mechanism of [18]. However, in our case instead of using a single noisy threshold across all steps, Algorithm generates in each step a noisy threshold . The value of ensures that with high probability . The proof of Theorem 5.1 is given in Section A.3.
Theorem 5.1**.**
Algorithm is an -differentially oblivious algorithm that outputs 1 iff there exists an element in the array that satisfies property . For , with probability it halts in time at most , where .
6 Differentially Oblivious Prefix Sum
Suppose that there is a dataset consisting of sorted sensitive user records, and one would like to compute the sum of all records in the (sorted) dataset that are less than or equal to a value in a way that respects individual user’s privacy. We call this task differentially oblivious prefix sum. For the definition of privacy we say that two datasets of size are neighbors if they agree on elements (although, as sorted arrays they can disagree on many indices). For example, and are neighbors and should have similar access pattern.
Without privacy one can find the greatest record less than or equal to value , and then compute the prefix sum by a quick scan through all records appearing before such record. Any perfectly secure algorithm must read the entire dataset (since it is possible that all elements are smaller than ). Here, we give a differentially oblivious prefix sum algorithm that for many instances is much faster than any perfectly oblivious algorithm.
Intuition.
Absent privacy requirements, using binary search, one can find the greatest element less than or equal to , and then compute the prefix sum by a quick scan through all records that appear before such record. However, the binary search access pattern allows the adversary to gain sensitive information about the input. Our main idea is to approximately simulate the binary search and obfuscate the memory accesses to obtain differential obliviousness. In order to do that, we first divide the input array into chunks (where is polynomial in , and ). Then, we find the chunk that contains the greatest element less than or equal to by comparing the first element (hence, the smallest element) of each chunk to . Let be the index of such chunk. Next, we compute a noisy interval that contains using the Laplacian distribution. We iteratively repeat this process on the noisy interval, where in each step we eliminate more than a quarter of the elements of the interval. We continue until the size of the array is less than or equal to . Next, we scan all elements in the remaining array and find the index of the greatest element smaller than or equal to . Let be the index of such element; we compute the prefix sum by scanning the array until index .
The Search Algorithm.
We present a search algorithm in Figure 5; on input and this algorithm finds the largest index such that . To compute the prefix sum, we compute and scan the first elements of the dataset, summing only the first . We show in Theorem 6.2 that our search algorithm is -differentially oblivious.
Remark 6.1**.**
We prove that algorithm Search is an -differentially private algorithm that returns a correct index with probability at least . We could change it to an -differentially private algorithm that never errs. This is done by truncating the noise to .
Theorem 6.2**.**
Let and . Algorithm Search is an -differentially oblivious algorithm that, for any input array with size and , returns a correct index with probability at least . The running time of Algorithm Search is .
Theorem 6.2 is proved in Section A.4.
6.1 Dealing with Multiple Queries
We extend our prefix sum algorithm to answer multiple queries. We can answer a bounded number of queries by running the differentially oblivious prefix sum algorithm multiple times. That is, when we want an -oblivious algorithm correctly answering queries with probability at least , we execute algorithm Search times with privacy parameter and error probability (each time also computing the appropriate prefix sum). Thus, the running time of the algorithm is (excluding the scan time for computing the sum).
On the other hand, we can use an ORAM to answer unbounded number of queries. That is, in a pre-processing stage we store the records and for each record we store the sum of all records up to this record. Thereafter, answering each query will require one binary search. Using the ORAM of [2], the pre-processing will take time and answering each query will take time . Thus, the ORAM algorithm is more efficient when .
We use ORAM along with our differentially oblivious prefix sum algorithm to answer unbounded number of queries while preserving privacy, combining the advantages of both of the previous algorithms.
Theorem 6.3**.**
Algorithm MultiSearch, described in Figure 6, is an -oblivious algorithm, which executes Algorithm Search at most times, where the run time of the -th execution is , scans the original database at most once, and in addition each query run time is at most .
Proof.
First note that we only pay for privacy in the executions of algorithm Search (reading and writing to the ORAM is perfectly private). In the -th execution of algorithm Search, we insert at least elements to the ORAM, thus after executions we inserted at least elements to the ORAM.
By simple composition, algorithm MultiSearch is -differentially private, where
[TABLE]
where the last inequality is implied by the sum of the harmonic series. ∎
Appendix A Missing Proofs
A.1 Proof of Lemma 2.5
Proof.
Let and be two neighboring datasets and be a sets of outputs. Then,
[TABLE]
∎
A.2 Proof of the Correctness and Privacy of Algorithm
Theorem 3.3 is implied by the following lemmas.
Lemma A.1**.**
Algorithm is -differentially oblivious.
Proof.
We first analyze a variant of , denoted by , in which Step 4 is replaced by “If then output ” (that is, the algorithm does not check if before deciding in the positive).
Let and be two neighboring graphs such that they differ on node . Fix the random choices of subsets in Step 2b and observe that after the execution of for loop, the count can differ by at most between the executions on and . Let be the smallest integer greater than . Since algorithm uses the Laplace mechanism for every . Thus,
[TABLE]
Similarly, . Hence, is -differentially oblivious.
We next prove that is -differentially oblivious using Lemma 2.5, that is we prove that for every graph , the statistical distance between and is at most . Let be the event that and observe that the probability occurs is at most .666 for every . Thus, . We have that \Big{|}\Pr[\mbox{Tester}_{\cal T}(G)=1]-\Pr[\mbox{\sc Tester}^{\prime}_{\cal T}(G)=1]\Big{|}\leq\Big{|}\Pr[\mbox{Tester}_{\cal T}(G)=1|E]-\Pr[\mbox{\sc Tester}^{\prime}_{\cal T}(G)=1|E]\Big{|}\Pr[E]\leq\Pr[E]\leq\delta. Thus, by Lemma 2.5, algorithm is -differentially oblivious. ∎
Observe that Algorithm never errs when as in that case after the for loop is executed and hence in Step 4 outputs . The next lemma analyses the error probability when is -far from .
Lemma A.2**.**
Algorithm is -tester for the graph property .
Proof.
Observe that on Step 2c of the algorithm, we are eliminating at most edges. Thus, we are eliminating at most edges in total. Then, when is -far from , it is also -far from after the removal of the observed nodes in each step of the for loop. We next prove that Algorithm fails with probability at most . Observe that if Algorithm fails on then or . We define to be output of in the -th step of the for loop. Let . Observe that all are independent and . Using the Chernoff Bounds777 for any where is the expectation of ., we obtain that . We also know . Therefore, Algorithm fails with probability . ∎
A.3 Proof of the Correctness and Privacy of Algorithm
The proof of Theorem 5.1 follows from the following claim and lemmas.
Claim A.3**.**
Let . The probability that there exists an element such that algorithm samples the element in Step 2a more than times is less that
Proof.
Fix an index . The probability that the element is sampled more than times is less than The claim follows by the union bound. ∎
Lemma A.4**.**
Let . Algorithm is -differentially oblivious.
Proof.
We first analyze a variant of , denoted by , in which Step 2(c)ii is replaced by “If then output ” (that is, the algorithm does not check if ) and no element is sampled more than times. We analyze the privacy of similarly to the analysis of the sparse vector mechanism in [18].
Let and be two neighboring datasets that such that and for some . Denote by the values of the thresholds in an execution of , where each threshold is rounded up to the smallest integer greater than . Furthermore, let be the index such that on input outputs 1 when (if no such exists, then ). Observe that in each execution of Step 2(c)ii the count on input is at least the count on input and can exceed it by at most (since is sampled at most times). Thus, on input with thresholds outputs 1 when . Since algorithm uses the Laplace mechanism with ,
[TABLE]
for every . Thus,
[TABLE]
Similarly,
[TABLE]
We next prove that is -differentially oblivious using Lemma 2.5. I.e, we prove that for every dataset , the statistical distance between and is at most . Notice that if all the thresholds are positive and all elements are sampled at most times then and have the same access pattern. By Claim A.3, the probability that there exists a that is sampled more than is at . We next observe that the probability that a threshold is negative is at most . Recall that for every . Thus, . Let be the event that at least one of the thresholds is at most 0 or some is sampled more that times. By the union bound the probability of is at most . Therefore, for every set of access patterns
[TABLE]
Thus, by Lemma 2.5, algorithm is -differentially oblivious. ∎
We next analyze the running and probe complexity of our algorithm. Let be the probability that a uniformly chosen element in satisfies . The non-private algorithm that samples elements until it finds an element satisfying has expected running time and the probability that it does not stop after steps is . We show that has a similar behavior.
Lemma A.5**.**
Let be the probability that a uniformly chosen element in satisfies . Then, for every integral power of two the probability that algorithm probes more than memory locations is less than . In particular, for , the probability is less than .
Proof.
Let . The probability that is Assuming that , the probability that the algorithm does not halt after steps is less than
[TABLE]
∎
A.4 Proof of the Correctness and Privacy of Algorithm Search
Theorem 6.2 is proved in the next 3 claims. We start by analyzing the running time of the algorithm.
Claim A.6**.**
Let and . The while loop in Algorithm Search is executed at most time. Furthermore, the total running time of the algorithm is .
Proof.
Let and be the values of before and after an execution of a step of the while loop in Algorithm Search. Note that
[TABLE]
Therefore, algorithm Search eliminates more than a quarter of the elements in each step of the while loop and the algorithm will halt after less than steps.
Moreover, observe that Algorithm Search makes memory accesses in each step of the while loop and additional memory accesses after the loop. Thus, its running time is (since ). ∎
Claim A.7**.**
Algorithm Search returns the correct index with probability at least .
Proof.
Let be the maximal index such that (i.e., is the index that algorithm Search should return). We prove by induction that if all Laplace noises in the algorithm satisfy then in each step of the algorithm , hence the algorithm will return in its last scan of between and .
The basis of the induction is trivial since . For the induction step, let and be the values of before and after an execution of a step of the while loop in Algorithm Search. By the induction hypothesis, . The algorithm finds an index such that . By our assumption on the Laplace noise, , thus, . Similarly, , thus, .
Recall that for every . Thus, by Claim A.6 and the union bound, the probability that one of the Laplace noises is greater than is at most . Hence, the probability that algorithm Search returns the correct index is at least . ∎
Next, we show that algorithm Search is -differentially oblivious.
Claim A.8**.**
Algorithm Search is an deferentially oblivious algorithm.
Proof.
We show below that each step of the while loop in algorithm Search is -differentially oblivious. Applying the basic composition theorem and Claim A.6, we obtain that the Search algorithm is -differentially oblivious.
Fix a step of the loop and view it as an algorithm that returns and (given these values the access pattern of the next step is fixed). Let and be two neighboring datasets such that for some we have and for all . It holds that for every . Let and be the values computed in step 2c of the algorithm on inputs and respectively. Thus, the value is at least the value and can exceed it by one. Intuitively, since algorithm Search uses the Laplace mechanism, the probabilities of returning a value on and are at most apart. Formally, if (where we consider two independent noises), then the algorithm returns the same value of on both inputs. The lemma follows since for every set :
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing of large graphs. Combinatorica , 20(4):451–476, 2000.
- 2[2] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, and Elaine Shi. Optorama: Optimal oblivious RAM. IACR Cryptology e Print Archive , 2018:892, 2018.
- 3[3] Marina Blanton, Aaron Steele, and Mehrdad Aliasgari. Data-oblivious graph algorithms for secure computation and outsourcing. In Kefei Chen, Qi Xie, Weidong Qiu, Ninghui Li, and Wen-Guey Tzeng, editors, 8th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’13 , pages 207–218. ACM, 2013.
- 4[4] David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. Leakage-abuse attacks against searchable encryption. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015 , pages 668–679. ACM, 2015.
- 5[5] T.-H. Hubert Chan, Kai-Min Chung, Bruce M. Maggs, and Elaine Shi. Foundations of differentially oblivious algorithms. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019 , pages 2448–2467. SIAM, 2019.
- 6[6] Srinivas Devadas, Marten van Dijk, Christopher W. Fletcher, Ling Ren, Elaine Shi, and Daniel Wichs. Onion ORAM: A constant bandwidth blowup oblivious RAM. In Eyal Kushilevitz and Tal Malkin, editors, Theory of Cryptography - 13th International Conference, TCC 2016-A , volume 9563 of Lecture Notes in Computer Science , pages 145–174. Springer, 2016.
- 7[7] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006 , volume 3876 of Lecture Notes in Computer Science , pages 265–284. Springer, 2006.
- 8[8] David Eppstein, Michael T. Goodrich, and Roberto Tamassia. Privacy-preserving data-oblivious geometric algorithms for geographic data. In Divyakant Agrawal, Pusheng Zhang, Amr El Abbadi, and Mohamed F. Mokbel, editors, 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010 , pages 13–22. ACM, 2010.
