This paper investigates how to privately compute a specific ordered composition of linear basic functions across multiple servers, optimizing the number of queries while ensuring the servers learn nothing about the function order.
Contribution
It introduces the problem of private sequential function computation, derives capacity bounds, and proposes a method to achieve the desired privacy with optimal query efficiency.
Achievability: proper inquiry order retrieves desired composition while maintaining privacy
03
Upper bound: information-theoretic converse on capacity
Abstract
Consider a system, including a user, N servers, and K basic functions which are known at all of the servers. Using the combination of those basic functions, it is possible to construct a wide class of functions. The user wishes to compute a particular combination of the basic functions, by offloading the computation to N servers, while the servers should not obtain any information about which combination of the basic functions is to be computed. The objective is to minimize the total number of queries asked by the user from the servers to achieve the desired result. As a first step toward this problem, in this paper, we consider the case where the user is interested in a class of functions which are composition of the basic functions, while each basic function appears in the composition exactly once. This means that in this case, to ensure privacy, we only require to hide to the…
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
Private Sequential Function Computation
Behrooz Tahmasebi and Mohammad Ali Maddah-Ali
Behrooz Tahmasebi is with the Department of Electrical Engineering and Computer Science (EECS), Massachusetts Institute of Technology (MIT), Cambridge, MA, USA (email: [email protected]). Mohammad Ali Maddah-Ali is with Nokia Bell Labs, Holmdel, NJ, USA (email: [email protected]).
This paper has been presented in part at IEEE ISIT 2019 [1].
Abstract
Consider a system, including a user, N servers, and K basic functions which are known at all of the servers. Using the combination of those basic functions, it is possible to construct a wide class of functions. The user wishes to compute a particular combination of the basic functions, by offloading the computation to N servers, while the servers should not obtain any information about which combination of the basic functions is to be computed. The objective is to minimize the total number of queries asked by the user from the servers to achieve the desired result.
As a first step toward this problem, in this paper, we consider the case where the user is interested in a class of functions which are composition of the basic functions, while each basic function appears in the composition exactly once. This means that in this case, to ensure privacy, we only require to hide to the order of the basic functions in the desired composition of the user. We further assume that the basic functions are linear and can be represented by (possibly large scale) matrices.
We call this problem as private sequential function computation. We study the capacity C, defined as the supremum of the number of desired computations, normalized by the number of computations done at the servers, subject to the privacy constraint.
We prove that (1−N1)/(1−max(K,N)1)≤C≤1. For the achievability, we show that the user can retrieve the desired order of composition, by choosing a proper order of inquiries among different servers, while keeping the order of computations for each server fixed, irrespective of the desired order of composition. In the end, we develop an information-theoretic converse which results in an upper bound on the capacity.
Keywords:
Private information retrieval,
private computation,
private function computation.
1 Introduction
Outsourcing storage and computation to external parties are the inevitable reaction to the growing size of data and increasing load of processing.
One of the main challenges in those arrangements is to ensure the privacy of data and algorithms, with minimum overhead in terms of computation, storage, and communication.
Private function retrieval (PFR) [40, 41], also known as private computation, is one recently proposed approach to model the problem of privacy in computation. In this problem, a user wants to retrieve a linear combination of a number of files, stored in a number of replicated non-colluding servers, without revealing any information about the coefficients appeared in the linear combination to the servers [40, 41]. In [41, 40], the authors studied the information theoretic capacity of PFR, defined to be the maximum number of bits of information about the desired linear computation that can be privately retrieved per bit of downloaded information. In [40], the authors fully characterized the capacity of the problem. This problem is also considered for the coded databases [42, 43, 44]. This is also extended to private computation of arbitrary polynomials on Lagrange coded data [45] (see also [46]), and private inner product retrieval [37].
PFR is an extension of the renowned problem of private information retrieval (PIR). In PIR, a user wishes to retrieve a specific file from a database, duplicated across multiple non-colluding servers, while the file identity must be kept private from the servers [2, 3, 4]. In [4], the basic PIR problem has been investigated from an information-theoretic viewpoint and its capacity has been characterized. Again, the capacity there is defined as the maximum number of desired information bits per bit of download in privacy preserving algorithms. Interestingly, the capacity of PIR and PFR are the same, meaning that there is no extra cost to retrieve a linear combination of the files, rather than a pure file. PIR has been also extended to the different scenarios, including but not limited to, multiround PIR
[5], PIR with colluding servers [6], PIR with coded storage [7, 8], PIR with eavesdroppers [9], PIR with adversary
[10], PIR from wiretap channel
[11], cache-aided PIR [12, 13, 14], PIR with coded and colluding servers [15, 16, 17], connections between PIR and distributed storage systems [18], and many other problems
[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38].
Another formulation for private computation is known as secure multi-party computation (MPC) [47, 48, 49]. In the secure multi-party computation, a group of parties are trying to perform a computation task on their private inputs without disclosing any information about the inputs to each other [50]. In other words, the objective in MPC is to keep the inputs secure. In this context, an important question is how to perform such task with the minimum number of servers required. In MPC also, the servers may collude, up to a given number, in order to gain information about the private inputs. A new formulation of this problem is also recently proposed, where the communication constraint for the computation of high dimensional inputs is also considered. For more explanations, see [51, 52, 53, 54].
In this paper, we consider private computation from a different perspective. We introduce the problem of private function computation as follows.
Assume that there are a number of basic and public functions
{f1,f2,…,fK}. Using the combination of basic functions,
we can construct a wide class of functions of interest. A user wishes to compute a particular combination
of the basic functions for an arbitrary input. The user wants to offload the computation
to N non-colluding servers, each can compute all of the basic functions. However, the user wants to keep the desired
function private. Note that making the function private is equivalent to hide the way we combine the basic functions.
To achieve the desired result, the user sends a sequence of inquiries to the servers in a recursive manner. The servers answer to the inquiries of the user. In this paper, it is assumed that the servers are not synchronized. Finally, the user must be able to achieve its desired result, while the servers should not be able to obtain any information about the desired function, or equivalently the desired combination of the basic functions, of the user. The objective here is to minimize the number of inquiries sent by the user to the servers under the aforementioned criterion.
As a first step toward studying the problem of private function computation, in this paper, we study a basic version of it, called private sequential function computation, in which the class of desired functions which the user is interested in contains only the functions which are composition of the basic functions111We note that there are applications where the computation of a complicated function is the main goal of the problem, and this function is identical to the composition of a number of basic functions, cf., [55].. In addition, we assume that each basic function appears in the composition exactly once. Also, the functions are assumed to be linear, and they can be represented by (possibly large scale) full rank matrices. For example, if the set of basic functions is {f1,f2,f3}, the desired function by the user is one of these six options f1∘f2∘f3, f1∘f3∘f2, f2∘f1∘f3, f2∘f3∘f1, f3∘f1∘f2 or f3∘f2∘f1. At the end of computation, the servers should not infer anything about the desired function of the user among the above 6 options. Due to this assumption, to ensure the privacy, the only information that is required to be kept private from the servers is the order of the composition.
We further assume that each time the user sends an inquiry, including a vector and the index of one the basic functions, to one of the servers, and waits for that server to return the computation of the corresponding basic function for that vector as the input. For example, the user sends the query (n,W,k) to server n, and server n returns fk(W). Then the user forms another inquiry for one of the servers, as a function of the initial input vectors, the results of the previous inquiries, and possibly some extra random vectors. In the end, the user should be able to compute the final result, and the servers should not gain any information about the order of composition.
When the number of servers is more than the number of basic functions, this problem is simple. The user asks each server to compute at most one of the functions. The order of inquires determines the order of the basic functions in the desired function, but each server has no idea about the order. The challenges start when the number servers is less than number of the basic functions, and thus some servers receive more than one inquiries from the user. The challenges are two folds. The server may infer information about the order of composition in the desired function from
•
the order of functions in different queries that it receives,
•
the dependency of the inputs in the inquiries that it receives.
In this paper, we study capacity of the private sequential function computation problem, which is defined as the supremum of the number of desired computations per query, subject to the privacy constraint. We derive lower and upper bounds on the capacity of the private sequential function computation, denoted by C, as (1−N1)/(1−max(K,N)1)≤C≤1. We assume that the user has M independent inputs, for some large integer M, and the objective is to calculate the desired function for all of those M inputs. We show that the user can compute the desired function for those inputs, by choosing a proper order of inquiries among different servers, while keeping the order of computations for each server fixed, irrespective of the desired order of composition in the desired function. Therefore, each server observes a fixed order of computations, and thus gains no information about the desired order of composition. This will resolve the first challenge raised above. In addition, in each query, the user may add some random vector to the input of the query, to eliminate the dependency of inputs in the queries sent to one server. These random vectors are reused in queries to other servers in a manner that allows the user the eliminate their contributions in the final results. This is to resolve the second challenge raised above.
We then provide an information theoretic converse, which results to an upper bound on the capacity. We note that the inequality C≤1 may seem to be trivial at a glance, but as we show in the paper, this is not the case.
The rest of this paper is organized as follows. In Section
2,
we introduce the mathematical model considered in the paper.
Section
3
includes the main result of the paper.
To prove the theorem, we first provide a number of illustrative examples in Section
4,
and then present the achievable scheme in Section
5.
In Section 6, we prove the feasibility, correctness, and privacy of the proposed scheme. The proof of the upper bound on the capacity is presented in Section
7,
and finally, Section
8
concludes the paper.
*Notation: For any positive integers a,b, let [a:b]:={a,a+1,…,b}. We use SK to denote the set of all permutations of [1:K]. Using Σ=(ΣKΣK−1…Σ1)∈SK, we denote the permutation k↦Σk.
Throughout the paper F is an arbitrary finite filed. The summation in F is denoted by ⊕. All the logarithms are in the base of ∣F∣. For two random variables X,Y we write X∼Y whenever X and Y are identically distributed. The notation Wi:j means (Wi,Wi+1,…,Wj). For two functions f(n),g(n), we write f(n)=o(g(n)), when g(n)f(n)→0, as n→∞.
*
2 Problem Statement
Consider a system, including a user, having access to M∈N input vectors W1(L),W2(L),…,WM(L), chosen independently and uniformly at random from FL, for some finite field F and some large integer L. Thus,
[TABLE]
Assume that there is a set of basic functions represented by K (possibly large scale) matrices F1(L),F2(L),…,FK(L)∈FL×L, which are distributed independently and uniformly over the set of invertible matrices222
Note that the set of invertible matrices includes almost all of the matrices, specifically in high dimensions.
in FL×L, thus,
[TABLE]
For simplicity, throughout the paper, whenever it is not confusing, we drop superscript (.)(L).
The user wishes to retrieve a specific sequential composition of the K basic functions of input vectors.
The sequence of computations is represented by a permutation Σ=(ΣKΣK−1⋯Σ1) of [1:K], selected by the user uniformly at random from the set of all possible permutations. Thus, the user wants the result of
FΣKFΣK−1⋯FΣ2FΣ1Wm for all m∈[1:M]. This means that the order of composition is fixed for all M input vectors.
The user does not have any information about the matrices333We make this assumption to prevent the possibility of local computation at the user side.
Fk, k∈[1:K]. To obtain the desired result, it relies on N∈N non-colluding servers, each has access to the full knowledge of Fk, k∈[1:K]. We also refer to the parameter M as the number of requests in the paper.
Each time that a server is called by the user, it receives some W∈FL along with an index k∈[1:K], and returns FkW to the user. Assume that the servers are not synchronized.
Also, assume that the user utilizes the servers for D times in total, in order to achieve its desired result. This means that the user generates a sequence of queries Q1,Q2,…,QD, where query Qd, d∈[1:D], is an ordered triple Qd=(Qd(server),Qd(input),Qd(function)), meaning that the user at the dth step, sends Qd(input)∈FL to the server Qd(server)∈[1:N], and asks it to run the function Qd(function)∈[1:K]. The server then computes the desired result, denoted by Ad, Ad=FQd(function)Qd(input)∈FL, and sends it to the user.
We refer to D as the number of queries in the paper.
A (K,N,M,D,L) scheme of private sequential function computation comprises of a sequence of (possibly randomized) encoders Φd, d∈[1:D], such that
Qd=Φd(W1:M,A1:d−1,Σ).
In addition, it includes a sequence of functions (decoders) Ψm, m∈[1:M], such that
Ψm(W1:M,A1:D,Σ) is an estimation of FΣKFΣK−1…FΣ1Wm.
We need to specify a notation for the sequence of queries received by each server.
Definition 1**.**
For any n∈[1:N], the list of queries received by server n is denoted by
[TABLE]
We call Qn as the marginal list of queries at server n.
Remark 1**.**
Note that since servers are not synchronized, the server n does not have any information about the
locations of its received queries in the query list of the user, i.e., the exact values of index d for the queries are unknown at the server side, although their relative orders are known.
We now formally define an achievable scheme.
Definition 2**.**
For any positive integers K,N,M,D∈N,
a sequence of (K,N,M,D,L) computation schemes {Φ1:D(L),Ψ1:M(L)}, L=1,2,…,
is said to be M−achievable, iff
•
[Correctness]* for any m∈[1:M],*
[TABLE]
as L→∞.
•
[Privacy]* for any n∈[1:N],L∈N, the marginal list of the nth server Qn(L) must be independent of the order of composition, i.e., I(Qn(L),F1:K(L);Σ)=0.*
Now we present the definitions of an achievable rate and the capacity of the private sequential function computation problem.
Definition 3**.**
The rate of a (K,N,M,D,L) computation scheme is defined as R=DKM. A rate R is said to be M−achievable, iff there exists a sequence of M−achievable computation schemes (K,N,M,D,L), L=1,2,…, such that their rate is least R.
Remark 2**.**
The motivation of this definition for the rate is that in a (K,N,M,D,L) computation scheme, the user wants to compute a composition of K functions on M input vectors (total of KM computations) and it utilizes the servers for D times.
Definition 4**.**
A positive rate R is said to be achievable, if there is a sequence of M−achievable rates {RM}M∈N, such that M→∞limRM=R.
Definition 5**.**
The capacity of the private sequential function computation, denoted by C, is defined as
the supremum of all achievable rates.
3 Main Results
The main result of this paper is summarized in the following theorem.
Theorem 1**.**
The capacity of the private sequential function computation problem satisfies the following inequality:
[TABLE]
where K is the number of basic functions to be computed in a sequential and private manner, and N is the number of non-colluding servers.
The general achievable scheme for Theorem 1 can be found in Section 5, the formal proofs of correctness and privacy of the proposed algorithm are presented in Section 6, and the converse proof of Theorem 1 is detailed in Section 7.
Remark 3**.**
When K≤N, the capacity is equal to one. This means that if the number of functions does not exceed the number of servers, then one can achieve the private computation without any extra cost. This should not be surprising, because a simple
achievable scheme where the computation of each basic function is assigned to one specific server
ensures privacy (see Example 1).
Remark 4**.**
When K>N, the user wishes to compute a composition of a number of basic functions which is greater than the number of available servers. Intuitively speaking, in this case there is at least one server that must compute at least two basic functions. This means that to achieve privacy, the order of computations at that server should not leak information about the desired order of composition. To ensure privacy in this case, we propose a scheme which has a surprising feature: the order of computations at each server is fixed, irrespective of the desired order of computations. The user can retrieve any order of composition, by selecting an appropriate order for queries Q1,Q2,…,QD, such that the order of computations at each server remains fixed. In addition, some randomness is added to the inputs, such that the sever cannot infer any order from dependencies between inputs of the different queries. The proposed scheme satisfies both privacy and correctness with zero probability of error, and the achievable rate of (1−N1)/(1−K1) as M→∞.
Remark 5**.**
Let us explain the differences between the problem of private sequential function computation and the other formulations for the problem of private computation.
•
In the PFR problem [40, 41], there are K vectors Wk, k∈[1:K], available at N non-colluding replicated servers, and a user wishes to retrieve the result of ∑k=1KαkWk, for some scalars αk, k∈[1:K]. The goal there is to keep the coefficient αk, k∈[1:K] in the linear combination private. Let F:=[W1,W2,⋯,WK]∈FL×K and X=[α1,α2,⋯,αK]T. Therefore, by the new notations, in the PFR problem, the user wishes to retrieve FX, while it is required to keep the vector X private from the servers. However, in the problem of private sequential function computation, the user want to retrieve, e.g., FΣKFΣK−1⋯FΣ2FΣ1X, for some permutation (ΣKΣK−1⋯Σ1) of [1:K] while the privacy means the the servers should not obtain any information about the permutation (ΣKΣK−1⋯Σ1).
•
In the secure MPC model, each one of K parties has a private input Fk, k∈[1:K], and the final goal is to compute p(F1,F2,⋯,FK)×X, where p(.) is a polynomial and X is a vector, using N servers. In MPC, the privacy condition requires that in the procedure of computation, the servers and other parties learn nothing about the private inputs Fk, k∈[1:K]. However, in the problem of private sequential function computation, all servers are fully aware of F1,F2,⋯,FK, and the objective is to keep the order of composition (ΣKΣK−1⋯Σ1) private.
4 Motivation and Examples
To motivate the achievable scheme, in this section, we explain the proposed scheme through some examples.
Example 1** (K=2 and N=2).**
This examples explain why for K≤N, Theorem 1 states that the capacity of the private sequential function computation is equal to one.
In this example, we have N=2 servers and K=2 basic functions. Also, assume that the user has only M=1 input vector W1. Note that in this case we have only two distinct orders of compositions; σ=(1\leavevmode2) and σ=(2\leavevmode1). One simple achievable scheme is as follows:
•
F2(F1W1). In this case, the user asks the first server to compute the function F1 on W1, and then receives F1W1. Then, the user asks the second server to compute the function F2 on F1W1. We use the following schematic to demonstrate this scheme:
[TABLE]
•
F1(F2W1). In this case, first the user asks the second server to compute F2 on W1, and it receives F2W1. Then, the user asks the first server to compute F1 on F2W1, see the following:
[TABLE]
Trivially, the proposed scheme satisfies the correctness property. It is also private. Note that in both cases the first server receives a vector and computes F1, and the second server receives a vector, and computes F2. There is no way for each server to distinguish the two cases. Recall that F1,F2 are full rank matrices, and since W1 is distributed uniformly over FL, F1W1 and F2W1 are also distributed uniformly over FL.
Generally, when K≤N, one can employ a similar approach to
achieve the capacity in a one-shot zero-error setting as follows:
[TABLE]
In the above scheme, the arrows demonstrate the order of asking the queries by the user. Accordingly, to compute FσKFσK−1⋯Fσ2Fσ1W1, the user first asks server σ1 to compute Fσ1W1 with W1 as the input, then it asks server
σ2 to compute Fσ2Fσ1W1 with Fσ1W1, and so on.
Thus, irrespective of the order (σKσK−1⋯σ1), the user always asks server n, n∈[1:K], to compute function Fn. Servers K+1 to N remain idle. Thus privacy is preserved.
Example 2** (K=3 and N=2).**
This example demonstrates how to achieve the lower bound on the capacity for cases K>N. Consider the problem of private sequential function computation with N=2 servers and K=3 functions. Assume that the user has M input vectors W1:M, for some integer M, and wants to derive Fσ3(Fσ2(Fσ1W1:M)), i.e., the desired order is σ=(σ3\leavevmodeσ2\leavevmodeσ1). To explain the challenges of designing an achievable scheme for this case, let us try some solutions.
•
(1st try) Consider the following computation scheme, which inspired from what we proposed in the previous example:
[TABLE]
From the above table, one can easy infer the order of queries are as follows: (Server1,W1,F1), (Server2,F1W1,F2), (Server1,F2(F1W1),F3), where the input of each query is the output of the previous query. In other words, first server one computes F1W1 with input W1, then server 2 computes F2(F1W1) with input F1W1, then server one computes F3(F2(F1W1)) with input F2(F1W1).
This computation scheme breaches information about σ. The reason is that server one in the first query that it receives, computes produces F1W1, and in the second query, it receives F2(F1W1)) as input. With a little effort, it realizes that the second input is indeed equal to F2 times the first output. Thus server one realizes that the desired order of composition is indeed F1, then F2, and finally F3.
•
(2nd try) In the above scheme, one server can apply reverse computation and gain information about the order of computations. One possible solution is to add randomness to the queries to ensure the privacy; see the following:
[TABLE]
In the above scheme, the user chooses Z uniformly at random from FL. From the above table, one can easy infer the order of queries as (Server1,W1,F1), (Server2,F1W1,F2), (Server1,F2(F1W1)⊕Z,F3), and (Server2,Z,F2). Note that those quires can be formed by the user from the result of previous queries, and without any knowledge of Fk, k=1,2,3. In addition, it is easy to verify that the user can calculate F3(F2(F1W1)) by subtracting the result of the last query F3⊕Z from the result of the third query F3(F2(F1W1)⊕Z).
Is the above scheme private? The positive aspect of the above scheme is that with reverse computation, no information can be inferred by the servers. More precisely, the inputs of the two queries of server one, i.e., W1 and F2(F1W1)⊕Z, are independent, given the knowledge of Fk, k=1,2,3. Similarly the inputs of the two queries of server 2, i.e., F1W1 and Z, are also independent, given the knowledge of Fk, k=1,2,3. However, it is not enough to guarantee privacy. The reason is that, in the above scheme, for the other orders of composition σ, it is required to ask different orders of inquiries from each servers. For example, if the desired order is σ=(2\leavevmode1\leavevmode3), then the order of computation at server one is F3→F2, i.e., server 1 first computes F3 and then F2, and the order of computation at server two is F1→F2, i.e., server 2 first computes F1 and then F2. In other words, the order of computation at each server depends on σ, which is the violation of privacy constraint.
•
(Complete solution) From the first two tries, we observe that there are two challenges in designing the achievable scheme, (1) dependency of the inputs of the different queries to each server, which allows information leakage by reverse computation, and (2) dependency of the order of computation at each server to the desired order of composition σ. To solve these issues, we introduce the following computation scheme. The proposed scheme eliminates the possibility of reverse computation by adding randomness.
It also uses fixed order of computation at each server irrespective to σ. The proposed solutions for all six possible choices of σ are illustrated in the following tables. Note that the arrows show the order of asking the queries and
the random vectors Z1:M+2 are distributed independently and uniformly over FL.
[TABLE]
[TABLE]
[TABLE]
Let us describe the solution for the case σ=(3\leavevmode2\leavevmode1) as an example. From the table, one can easily infer the order of queries is (Server1,W1,F1)→(Server2,F1W1,F2)→(Server1,Z1,F3)→(Server2,F2(F1W1)⊕Z1,F3)→(Server1,W2,F1)→(Server2,F1W2,F2)→(Server1,Z2,F3)→(Server2,F2(F1W2)⊕Z2,F3)→…. We note that the input of each query is a function of the outputs of the previous queries. In addition, the user can form these queries without any knowledge about the basic functions, and without any matrix multiplication.
In addition, in the end, the user can calculate F3(F2(F1Wm)) by subtracting F3Zm from F3(F2(F1Wm)⊕Zm), for m=1,…M, thus correctness is verified. Similarly, the correctness for all six order of desired computation can be verified.
It remains to establish privacy for the proposed solution. First let us have two more observations:
•
Given the full knowledge of F1,F2,F3, the inputs to each server in different queries are independent. This means that there is no possibility of reverse computing and inferring about the order of computation.
•
In all six options for σ, the order of computation in server one is 2M+1F1→F3→F1→⋯→F1, and the order of computation in server two 2M+1F2→F3→F2→⋯→F2. Since the sequence of the computed functions by each server is predetermined and fixed, and is not a function of σ, it cannot reveal any information about the desired order of computations.
We summarize the above six solutions for the six different options for σ as follows:
[TABLE]
After showing that the proposed scheme has the privacy and correctness properties, we compute its rate. In total, there are 4M+2 queries, while the user wants to compute three functions on M input vectors. Hence, the rate of computation is 4M+23M. As M→∞, this rate achieves the claimed lower bound on the capacity of the problem, which is equal to 43.
Example 3** (K=4 and N=3).**
Example 2 shows that one may use a predetermined sequence of functions to be computed at each server, and also use some random vectors, in order to achieve the lower bound of Theorem 1 on the capacity in the asymptotic regimes of large M. However, we require a systematic approach to generalize that idea for arbitrary K,N. We explain the proposed approach in this example. Note that this method is slightly different from the previous example.
Consider the problem of private sequential function computation with N=3 servers and K=4 functions444
In this example, we apply minor changes on the notations described in Section 2. These changes are explained through the example.
. Assume that the user has M=2M′ input vectors, denoted by Wm,j, m∈[1:M],j∈{1,2}. The reason that we use two indices (m,j) for the inputs is to simplify the explanation. In addition, we assume that the user wants to compute Fσ4(Fσ3(Fσ2(Fσ1W1:M,1:2))), i.e., σ=(σ4\leavevmodeσ3\leavevmodeσ2\leavevmodeσ1).
We first assign a fixed order of computations to each server. In particular, we assign the following order of computation to server n , n∈{1,2,3},
[TABLE]
no matter what the desired order of composition σ=(σ4\leavevmodeσ3\leavevmodeσ2\leavevmodeσ1) is.
This means that to compute M=2M′ requests, the user asks 3(M′+3) queries from each server, and 9M′+27 queries in total. We call this step as function assignment. We illustrate the proposed function assignment, universally for all possible σ, as follows:
[TABLE]
For the convenience in the description of the achievable scheme, we divide the computations of 3(M′+3) functions at each server to M′+3blocks of computations, each comprised of three computations per server.
It can be observed that the function assignment is the same for all the blocks. Since the servers are not synchronized, we need to also assign an order for asking the queries based on the proposed function assignment.
In the proposed scheme, we assume that the blocks of computations are requested by the user sequentially. This means that the user first asks all of the queries in a block from the servers, then begins the next block.
Also assume that the queries in each block are asked by the user in an arbitrary order.
We claim that by exploiting this function assignment and order determination for asking queries, one can design an achievable scheme for any desired permutation.
Note that in this case, there are 4!=24 distinct permutations. As examples, we illustrate the achievable scheme on two specific permutations.
4.0.1 σ=(1\leavevmode3\leavevmode4\leavevmode2)
In the following figure, we propose the procedure of computations in the first four blocks. In this figure, we use a new notation for some variables Z∗. There is an important remark about Z∗.
Remark 6**.**
In each block of the scheme, we observe variables Z∗ in seven different places. These seven variables in each block named as Z∗ are essentially different, drawn randomly, uniformly, and independently from each other and from all other random variables in the solution. However, since each of them are not appear in the scheme again, and to avoid confusing the reader by introducing a long list of random variables, they are not denoted differently. We use these random variables to keep the list of queries to each server for different choices of σ identically distributed.
In the first block, the requests corresponded to W1,{1,2} are considered and the function F2 is applied on them (at server 2).
In the second block, the user has access to F2W1,{1,2}, and asks the servers one and server two to apply the function F4 on them. To run F4, the user utilizes a randomly drawn vector to ensure the privacy of the computation. Also, the user in this block again asks from the second server to perform a similar task to the previous block on new vectors W2,{1,2}. The third block is also similar. The function F3 is executed on the requests corresponded to
W1,{1,2}, the function F4 is computed for the requests corresponded to W2,{1,2}, and the function F2 is computed for the requests corresponded to W3,{1,2}.
In the fourth block, the user again asks three servers to compute specific functions similar to the third block. In addition, for the requests corresponded to W1,{1,2}, the function F1 is computed and the final result of computation for them is available at the end of this block.
The rest of the scheme is similar, where M=2M′ requests are computed in the M′+3 blocks of computations.
Observe that the scheme is correct and private. The privacy is due to the fact that all the inputs given to each server are uniformly and independently drawn vectors, which are independent from the desired permutation of the user. Also the order of computations at each server is the same for all desired permutations. This can be confirmed by comparing the schemes for σ=(1\leavevmode3\leavevmode4\leavevmode2) and σ=(4\leavevmode3\leavevmode2\leavevmode1) (see Subsection 4.0.2). To derive the scheme for other choices of σ, refer to the general achievable scheme.
[TABLE]
Now we compute the rate of the proposed scheme. Note that there are 2M′ input vectors and we have 4 functions (8M′ in total), while the user utilizes the servers for 9M′+27 times.
Therefore, the rate of the proposed achievable scheme is 9M′+278M′, which goes to 98 as M→∞555
We note that this rate is not necessarily optimum for the one-shot case and one may compute privately 2M′ requests with less than 9M′+27 computations. However, in the asymptotic regime, the rate achieves the claimed lower bound on the capacity and the gap is vanishing.
.
4.0.2 σ=(4\leavevmode3\leavevmode2\leavevmode1)
To observe that the proposed approach works for any permutation, we consider another permutation here and construct a similar scheme for this case. The proposed scheme has been detailed in the following table.
[TABLE]
We now review the proposed scheme in Example 3, and define some new notations. The new notations will help us to generalize it in the next section. We note that for any set of vectors Wm,{1,2}, we first need to compute Fσ1, then Fσ2, up to FσK. Let us denote formally the process of computation of the kth function Fσk for the requests corresponded to Wm,{1,2}, by Rmk.
As discussed, in Example 3, the scheme utilizes M′+3 blocks of computations, and performs all the tasks Rmk, for k∈{1,2,3,4} and m∈[1:M′].
In the formal description in above, we assigned the task R11 to the first block. This means that at the end of the first block, Fσ1W1,{1,2} is available at the user side.
For the second block, we assigned two tasks R21 and R12. This means that at the end of this block, the user has access to Fσ1W2,{1,2} and Fσ2(Fσ1(W1,{1,2})).
Generally, before the mth block, the tasks Rm′k′, for k′+m′≤m, have already been performed, and the tasks Rm′k′, for k′+m′=m+1, will be performed at the mth block. By this explanation, one can see that if there is a computation scheme that can perform all of the above tasks, then it is 2M′−achievable.
In Example 3, it is shown that such computation scheme exists.
We will generalize this approach for arbitrary K,N in the next section to construct an achievable scheme666
We used the notion of task here only to explain our achievable scheme more apprehensible. We note that there is not any necessity for the user to achieve the outputs of the tasks. The only necessity is to find the target result with high accuracy; for more explanations see Section 2.
.
5 The Achievable Scheme
5.1 Preliminaries
In this section, we propose an achievable scheme for arbitrary K,N. We note that as shown in Example 1, one can simply design an achievable scheme when K≤N. Hence, throughout this section, we focus on the cases with K>N.
Also, with a minor change in assumptions described in Section 2, we assume that there are M=M′(N−1) requests in the system, i.e., the number of requests is divisible by N−1 . In other words, to propose an achievable scheme, we assume that the number of requests are divisible by N−1. Later, we will argue that this assumption does not affect the the rate in the asymptotic regimes.
We further divide the given M=M′(N−1) input vectors to M′ batches of N−1 input vectors. For each m∈[1:M′], we denote the input vectors in the mth batch by Wm,j, j∈[1:N−1].
To ensure the privacy and correctness constraints, we rely on the following two ideas:
•
The index of functions to be computed by each server is assigned in a deterministic manner, no matter what permutation is to be computed. We refer to this task as function assignment in the paper. We use the notion of block of computations, similar to Section 4, to propose an appropriate function assignment.
•
Whenever required, the user exploits randomly generated vectors to ensure privacy.
In the following subsections, we first propose a deterministic function assignment, and then we design an appropriate task assignment. After these steps, we propose a vector assignment, which is defined as the process of associating appropriate vectors to be transmitted from the user to the servers in queries.
5.2 Function Assignment
First we define the notion of block of computations as follows.
Definition 6**.**
A block of the computations for a private sequential function computation problem with N servers and K functions is defined as
[TABLE]
As shown above, each block contains two phases. In the first phase, each server computes a specific function, for N−1 times, and in the second phase, all servers compute K−Nsimilar functions.
Now, we are ready to propose the function assignment of the achievable scheme. In the achievable scheme, we utilize a deterministic function assignment, which comprised of M′+K−1 replicated blocks of computations a described above. To determine the order of queries asked by the user, we use the following rules:
•
The queries are asked block by block, i.e., the user first asks all of the queries of the first block, then asks all of the queries of the second block, and so on.
•
At each block, the user asks the queries by an arbitrary order. As we will show, all the vectors send by the user to the servers at each block are available at the user side before the block begins.
5.3 Vector Assignment
Let us first introduce some useful notations.
Definition 7**.**
Define
[TABLE]
and
[TABLE]
for any m∈[1:M′],k∈[1:K], and i∈[1:N−1]. In other words, we denote the procedure of computation of the kth function in the sequential computation (i.e., Fσk) for the requests corresponded to the mth batch by Rmk, and the inputs and outputs for this task are Ini(Rmk), Outi(Rmk), respectively.
Corollary 1**.**
[TABLE]
Clearly, the task Rmk is said to be performed whenever the value of Out1:N−1(Rmk) is available at the user side.
The last step to explain the achievable scheme is to define the vectors transmitted from the user to the servers in each step. Let π denote the unique permutation π∈SK such that σπ=πσ is the identity permutation. In the proposed achievable scheme, for each m∈[1:M′+K−1], at the mth block of computations, the user asks the nth server to perform the computation on the following vectors:
[TABLE]
In above, at the first phase, the server n, receives the vectors Ini(Rm−πn+1πn), i∈[1:N−1]. In the second phase, the two cases n<N and n=N are different. If n<N, then the server receives Inn(Rm−πN+i+1πN+i)⊕Zm,i, i∈[1:K−N], and if n=N, it receives Zm,i, i∈[1:K−N].
Note that the random vectors Zm,i, i∈[1:K−N], are distributed uniformly over FL, and they are independent from each other and from all the other random variables in the problem.
We note that the vectors Ini(Rm−πk+1πk) for each i∈[1:N−1], k∈[1:K], and t∈[1:M′+K−1] are well defined, except for the cases that m−πk+1≤0. For such cases, the user utilizes a uniformly drawn random vector, which is independent from all of the other variables in the problem, rather than Ini(Rm−πk+1πk), in the vector assignment. See the vectors Z∗ in Example 3 for more details.
An important question regarding the proposed vector assignment is that whether it is feasible or not. More precisely, it is required to be proved that the inputs assigned to each block are available at the end of the previous block. In the next section, we will prove the propose vector assignment is indeed feasible.
In Figure 1, we illustrate the proposed function assignment and vector assignment.
In this section, we prove the achievability part of Theorem 1. In particular, we focus on the proposed scheme described in the previous section. We first prove that this scheme is feasible, i.e., the proposed vector assignment is feasible, meaning that the user has access to the contents required to be transmitted to the servers at the time of transmission (see Subsection 5.3). Then, we prove the correctness and privacy of the proposed scheme.
Finally, we compute the rate of the achievable scheme, and then we prove the desired result.
6.1 Proof of Feasibility and Correctness
In this part, we first prove that the proposed vector assignment is feasible to be performed by the user, and then, we establish the correctness of the scheme.
Lemma 1**.**
The proposed vector assignment is feasible, i.e, the vectors which are to be sent by the user at each block of computations to the servers can be obtained by computing a linear combination of the available outputs of the previous blocks.
Proof.
To prove the feasibility of the vector assignment, we use induction. Note that, the vector assignment of block one is feasible trivially. Here we show that if the vector assignment of block m is feasible, the vector assignment of block m+1 is also feasible.
Since vector assignment of block m is feasible, In1:N−1(Rm−πk+1πk), k∈[1:K], are available before the mth block of computations begins. We need to show that In1:N−1(R(m+1)−πk+1πk) for each k∈[1:K] can be obtained from the information provided to the user until the end of the mth block.
From the induction hypothesis, the user can ask the queries based on the structure described in vector assignment step. We note that after the first phase of the block, the values of Fn′Ini(Rm−πn′+1πn′), i∈[1:N−1],n′∈[1:N] are available at the user side. In addition, at the second phase of the mth block, if we cancel the random vectors Zm,i, i∈[1:K−N], we conclude that all of the values of Fn′Ini(Rm−πn′+1πn′), i∈[1:K−1], n′∈[N+1:K] are available at the user side. Thus, all the values of Fn′Ini(Rm−πn′+1πn′), i∈[1:N−1], n′∈[1:K], are available for the user after the completion of the mth block.
Now observe that by changing the variables as n′=σk,
[TABLE]
which shows that after the mth block, the tasks Rm′k′ such that m′+k′=m+1 are performed. Note that777
If (m+1)−πk+1≤0, then the claim is trivial, because the required vectors are randomly drawn.
Ini(R(m+1)−πk+1πk)=Outi(R(m+1)−πk+1πk−1)=Outi(Rm−(πk−1)+1πk−1) for any k∈[1:K] and i∈[1:N−1]. Hence, the proof is complete.
∎
Note that the proof of Lemma 1, we can conclude that at the end of the (M′+K−1)th block, all the tasks RmK, m∈[1:M′] are performed, and hence the values of Out1:N−1(RmK) are available at the user side. Therefore, the proposed scheme is also correct, meaning that after running the proposed scheme, the user achieves its desired result (see the correctness condition in Section 2 for more explanations).
6.2 Proof of Privacy
To prove the privacy constraint, we require to show that in the proposed achievable scheme, I(Q~n,F1:K;Σ)=0, for each n∈[1:N]. Due to the deterministic function assignment in the proposed scheme, the only requirement is to show that the inputs given to the servers do not leak any information about the desired permutation.
Let us denote the inputs given to the nth server at the mth block of computations by Xn,m:=In1:N−1(Rm−πn+1πn), for the first phase, and Yn,m,[1:K−N]⊕Zm,[1:K−N], for the second phase.
Due to the proposed vector assignment, we have
[TABLE]
where Oi∈FL, i∈[K+1:N], are all zero vectors.
In addition, we define Zm:=Zm,[1:K−N] and we briefly write Yn,m⊕Zm to denote Yn,m,[1:K−N]⊕Zm,[1:K−N].
We need to show that
[TABLE]
for each σ′,σ′′∈SK. Note that
[TABLE]
where μ(Q~n,F1:K) does not depend on σ. Also,
[TABLE]
which completes the proof. Note that (a) holds because the utilized function assignment is deterministic, (b) holds because conditioned on Σ=σ and F1:K, we have the following Markov chain:
[TABLE]
for each888
Note that if Mi−πn+1≤0, then the claimed Independence hold trivially.
m1=m2. Also, (c) holds because
[TABLE]
6.3 Proof of Achievability for the cases where N−1∣M
In the previous section, we provided an achievable scheme for the cases where the number of requests is divided by N−1.
However, to prove the achievability, we need to propose a sequence of achievable scheme for any number of requests (see Section 2).
Consider a general case of the private sequential function computation problem in which there are arbitrary number of requests M=M′(N−1)+r. Here 0≤r≤N−2 is an arbitrary integer.
We propose an achievable scheme for the problem with these parameters as follows. For the first M′(N−1) requests, assume that the user utilizes the proposed scheme of the previous section. For any r remaining requests, the user asks an arbitrary server to compute all of the possible permutations for that request, i.e., the user asks the server for K×K! times999
Actually, this is not efficient to ask this numerous number of requests and the user can ask fewer questions. However, this effect vanishes in the asymptotic regime.
for each request. Due to the previous discussions, it is obvious that this scheme is both private and correct.
Hence, it gives a (M′(N−1)+r)−achievable rate.
Let us compute the rate for the proposed scheme for arbitrary number of requests M′(N−1)+r. For the first M′(N−1) requests, the scheme includes M′+K−1 blocks, each require N(K−1) function computations.
For the remaining r requests, the user asks the servers for r×K×K! times. All in all, there are (M′+K−1)×N(K−1)+r×K×K! number of queries.
Also, there are M′(N−1)+r requests for the sequential computation of K function at the user side. Hence, the rate of the proposed scheme is
[TABLE]
One can see that as the number of requests tends to infinity (i.e., M′→∞), this rate achieves N(K−1)K(N−1)=1−K11−N1, which matches the capacity of the problem. Therefore, the achievability proof is complete.
We prove the converse of Theorem 1
in the following.
7.1 Preliminaries
Lemma 2**.**
Consider a matrix
F∈{F′∈FL×L:det(F′)=0}, which is chosen randomly and uniformly, and M randomly and uniformly generated vectors, Wm∈F, m∈[1:M], each independent from the other vectors and from F.
Let W:=(W1,W2,…,WM)∈FL×M be a randomly generated matrix. We claim that the following propositions hold:
Denote an arbitrary realization of random vectors W1:M by w1:M.
Let rank(w)=dim(span(wS)), where S={s1,s2,…,srank(w)}⊆[1:M] includes rank(w) distinct elements.
Add specific vectors to the columns of matrix (ws1,ws2,…,wsrank(w)), and construct a matrix
[TABLE]
such that U is an invertible matrix. Note that F′:=FU∼F. Let em∈FL denotes the vector with all zero elements, except the mth element, which is equal to one. Note that U−1wsm=em, for all M∈[1:rank(w)].
Now we write
[TABLE]
where (a) follows since wm∈span(wS), for each m∈[1:M]\S.
Now taking the expectation from the two sides of the above inequality yields the desired result.
∎
Proof of (c).
Observe that
[TABLE]
Note that
dim(span(W1:m−1))≤m−1,
which means that each vector in span(W1:m−1) can be written as the weighted summation of m−1 specific vectors. This means that span(W1:m−1) contains at most ∣F∣m−1 distinct vectors. However, the vector Wm is chosen uniformly from the set FL.
Therefore, the probability that Wm lies in span(W1:m−1) can be upper bounded by ∣F∣m−1−L. Now we write
[TABLE]
which goes to zero as L→∞. This completes the proof.
∎
Proof of (d).
Using (b),(c), we obtain
[TABLE]
∎
Lemma 3**.**
For any random variables X,Y,Z, such that Z takes values from Z,
[TABLE]
Proof.
[TABLE]
In addition, we write
[TABLE]
∎
7.2 Proof of the Converse
In order to prove the converse, we show that for each M−achievable rate RM, we have RM≤1, meaning that D≥KM. Let us define101010The numbers
Dk(L), k∈[1:K], are possibly random, due to the random function assignment. However, the summation of them is deterministic, which is equal to D.
Dk(L):=∣{d∈[1:D]:Qd(function)=k}∣ for a given computation scheme. Note that D=\sum_{k=1}^{K}\mathbb{E}\Big{[}D_{k}^{(L)}\Big{]}.
Observe that to prove the converse, it is sufficient to show that \mathbb{E}\Big{[}D_{k}^{(L)}\Big{]}\geq M-o(1) for each k. Consider a sequence computations schemes, for L∈N, for the M−achievable rate RM (see Definition 2).
To review, note that from the Fano’s inequality, we obtain
[TABLE]
for any m∈[1:M].
For the sake of brevity, we do not use the superscripts (.)(L) throughout this section anymore.
Also, let FΣ:=FΣKFΣK−1⋯FΣ1 and
F∼k:=(F1,F2,⋯,Fk−1,Fk+1,⋯,FK).
Fix an integer k∈[1:K]. We write
[TABLE]
where (a) follows from Lemma 2, (b) follows from the correctness property, and (c) follows from Lemma 3 and the fact that Q1:D(function)∈[1:K]D.
Let δ1:D∈[1:K]D denotes an arbitrary realization of Q1:D(function). Let S={s∈[1:D]:δs=k}. Assume S={s1,s2,…,sdk} denote the elements of S, which are ordered increasingly111111
The realization of the random variable Dk is denoted by dk.
.
Lemma 4**.**
For any integer d∈[1:D],
[TABLE]
Proof.
Conditioning on F∼k,Σ and Q1:D(function)=δ1:D, we have the following Markov chain:
where (a) follows from Lemma 3, (b) follows from the fact that ∣S∣=dk, and (c) follows from Lemma 2.
Taking the expectation from (145) and combining with (128) results
[TABLE]
Therefore,
[TABLE]
Hence, we conclude that \mathbb{E}\Big{[}D_{k}\Big{]}\geq M-o(1), which completes the proof.
8 Conclusion and Discussion
In this paper, we introduced the problem of private function computation. We studied a basic version of it, namely private sequential function computation, and we showed that even in this special case, the problem is challenging. In particular, we investigated the information theoretic limits of the private sequential function computation problem, and we derived non-trivial lower and upper bounds for its capacity.
Some of the future directions of this work are as follows:
•
One direction is to derive tighter bounds on the capacity of the problem. We conjecture that the scheme proposed in this paper is capacity achieving, and the challenge is to find a tighter converse.
•
An interesting direction is to investigate the capacity for limited M.
•
Another interesting direction to consider other classes of functions for the problem of private function computation, rather than only compositions of (large scale) linear basic functions, where each basic function appears exactly once in the composition. The first step is possibly to consider the case of compositions of the basic functions, such that they can be repeatedly appear in the composition. Another case is when the desired function can be composition of the linear combinations of the the basic functions.
•
An extension of the proposed problem is the cases where some of servers, up to a given number, may collude or behave adversarially.
Bibliography55
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] B. Tahmasebi and M. A. Maddah-Ali, “Private sequential function computation,” in 2019 IEEE International Symposium on Information Theory (ISIT) , July 2019, pp. 1667–1671.
2[2] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in Proceedings of the 36th Annual Symposium on Foundations of Computer Science , 1995, pp. 41–50.
3[3] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private Information Retrieval,” Journal of the ACM (JACM) , vol. 45, no. 6, pp. 965–981, 1998.
4[4] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” IEEE Transactions on Information Theory , vol. 63, no. 7, pp. 4075–4088, July 2017.
5[5] ——, “Multiround private information retrieval: Capacity and storage overhead,” IEEE Transactions on Information Theory , vol. 64, no. 8, pp. 5743–5754, Aug. 2018.
6[6] ——, “The capacity of robust private information retrieval with colluding databases,” IEEE Transactions on Information Theory , vol. 64, no. 4, pp. 2361–2370, April 2018.
7[7] ——, “Private information retrieval from MDS coded data with colluding servers: Settling a conjecture by freij-hollanti et al.” IEEE Transactions on Information Theory , vol. 64, no. 2, pp. 1000–1022, Feb. 2018.
8[8] K. Banawan and S. Ulukus, “The capacity of private information retrieval from coded databases,” IEEE Transactions on Information Theory , vol. 64, no. 3, pp. 1945–1956, March 2018.