Privacy-preserving Distributed Machine Learning via Local Randomization and ADMM Perturbation
Xin Wang, Hideaki Ishii, Linkang Du, Peng Cheng, Jiming Chen

TL;DR
This paper introduces a privacy-preserving distributed machine learning framework using ADMM with local randomization and noise perturbation, enabling heterogeneous privacy guarantees without trusting the server and minimizing privacy loss over iterations.
Contribution
It proposes a novel ADMM-based DML framework that does not assume trusted servers and offers heterogeneous privacy levels based on data sensitivity and trust degrees.
Findings
The framework effectively balances privacy and model accuracy.
Experimental results validate the theoretical privacy guarantees.
The approach reduces privacy loss over multiple ADMM iterations.
Abstract
With the proliferation of training data, distributed machine learning (DML) is becoming more competent for large-scale learning tasks. However, privacy concerns have to be given priority in DML, since training data may contain sensitive information of users. In this paper, we propose a privacy-preserving ADMM-based DML framework with two novel features: First, we remove the assumption commonly made in the literature that the users trust the server collecting their data. Second, the framework provides heterogeneous privacy for users depending on data's sensitive levels and servers' trust degrees. The challenging issue is to keep the accumulation of privacy losses over ADMM iterations minimal. In the proposed framework, a local randomization approach, which is differentially private, is adopted to provide users with self-controlled privacy guarantee for the most sensitive information.…
| Dataset | Without privacy protection | Modified loss | Perturbed ADMM | ||||
| , | , | , | , | , | , | ||
| German | 75.00 | 71.00 | 74.00 | 69.67 | 64.00 | 74.33 | 67.67 |
| Image | 75.56 | 70.13 | 72.84 | 69.33 | 63.10 | 70.45 | 65.50 |
| Ringnorm | 77.38 | 73.44 | 76.82 | 73.74 | 66.18 | 75.77 | 70.23 |
| Banana | 58.22 | 54.33 | 56.06 | 54.28 | 43.11 | 55.89 | 54.44 |
| Splice | 56.60 | 46.84 | 56.60 | 54.94 | 46.39 | 55.83 | 52.50 |
| Twonorm | 97.90 | 96.59 | 97.38 | 96.51 | 92.28 | 97.41 | 94.77 |
| Waveform | 88.93 | 84.60 | 87.93 | 84.07 | 80.47 | 87.67 | 81.73 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAlternating Direction Method of Multipliers
Privacy-preserving Distributed Machine Learning via Local Randomization and ADMM Perturbation
Xin Wang, Hideaki Ishii, Linkang Du, Peng Cheng, and Jiming Chen X. Wang, L. Du, P. Cheng and J. Chen are with the State Key Lab. of Industrial Control Technology, Zhejiang University, Hangzhou, 310027, P. R. China. X. Wang is also with the Dept. of Computer Science, Tokyo Institute of Technology, Yokohama, 226-8502, Japan. Emails: [email protected]; [email protected]; [email protected]; [email protected]. Ishii is with the Dept. of Computer Science, Tokyo Institute of Technology, Yokohama, 226-8502, Japan. Email: [email protected]
Abstract
With the proliferation of training data, distributed machine learning (DML) is becoming more competent for large-scale learning tasks. However, privacy concerns have to be given priority in DML, since training data may contain sensitive information of users. In this paper, we propose a privacy-preserving ADMM-based DML framework with two novel features: First, we remove the assumption commonly made in the literature that the users trust the server collecting their data. Second, the framework provides heterogeneous privacy for users depending on data’s sensitive levels and servers’ trust degrees. The challenging issue is to keep the accumulation of privacy losses over ADMM iterations minimal. In the proposed framework, a local randomization approach, which is differentially private, is adopted to provide users with self-controlled privacy guarantee for the most sensitive information. Further, the ADMM algorithm is perturbed through a combined noise-adding method, which simultaneously preserves privacy for users’ less sensitive information and strengthens the privacy protection of the most sensitive information. We provide detailed analyses on the performance of the trained model according to its generalization error. Finally, we conduct extensive experiments using real-world datasets to validate the theoretical results and evaluate the classification performance of the proposed framework.
Index Terms:
Distributed machine learning, privacy preservation, ADMM, generalization error.
I Introduction
In the era of big data, distributed machine learning (DML) is increasingly applied in various areas of our daily lives, especially with proliferation of training data. Typical applications of DML include machine-aided prescription [1], natural language processing [2], recommender systems [3], to name a few. Compared with the traditional single-machine model, DML is more competent for large-scale learning tasks due to its scalability and robustness to faults. The alternating direction method of multipliers (ADMM), as a commonly-used parallel computing approach in optimization community, is a simple but efficient algorithm for multiple servers to collaboratively solve learning problems [4]. Our DML framework also use ADMM as the underlying algorithm.
However, privacy is a significant issue that has to be considered in DML. In many machine learning tasks, users’ data for training the prediction model contains sensitive information, such as genotypes, salaries, and political orientations. For example, if we adopt DML methods to predict HIV-1 infection [5], the data used for protein-protein interactions identification mainly includes patients’ information about their proteins, labels indicating whether they are HIV-1 infected or not, and other kinds of health data. Such information, especially the labels, is extremely sensitive for the patients. Moreover, there exist potential risks of privacy disclosure. On one hand, when users report their data to servers, illegal parties can eavesdrop the data transmission processes or penetrate the servers to steal reported data. On the other, the communicated information between servers, which is required to train a common prediction model, can also disclose users’ private data. If these disclosure risks are not properly controlled, users would refuse to contribute their data to servers even though DML may bring convenience for them.
Various privacy-preserving solutions have been proposed in the literature. Differential privacy (DP) [6] is one of the standard non-cryptographical approaches and has been applied in distributed computing scenarios [7, 8, 9, 10]. Other schemes which are not DP-preserving can be found in [11, 12, 13]. In addition, privacy-aware machine learning problems [14, 15, 16, 17] have attracted a lot of attentions, and many researchers have proposed ADMM-based solutions [18, 19, 20]. However, there exists an underlying assumption in most privacy-aware schemes that the data contributors trust the servers collecting their data. This trustworthy assumption may lead to privacy disclosure in many cases. For instance, when the server is penetrated by an adversary, the information obtained by the adversary may be the users’ original private data.
Moreover, most existing schemes provide the same privacy guarantee for the entire data sample of a user though different data pieces are likely to have distinct sensitive levels. In the example of HIV-1 infection prediction [5] mentioned above, it is obvious that the label indicating HIV-1 infected or uninfected is more sensitive than other health data. Thus, the data pieces with higher sensitive levels should obtain stronger protection. On the other hand, as claimed in [7], different servers present diverse trust degrees to users due to the distinct permissions to users’ data. The servers having no direct connection with a user, compared with the server collecting his/her data, may be less trustworthy. Here, the user would require that the less trustworthy servers obtain his/her information under stronger privacy preservation. Therefore, we will investigate a privacy-aware DML framework that preserves heterogeneous privacy, where users’ data pieces with distinct sensitive levels can obtain different privacy guarantee against servers of diverse trust degrees.
One challenging issue is to reduce the accumulation of privacy losses over ADMM iterations as much as possible, especially for the privacy guarantee of the most sensitive data pieces. Most existing ADMM-based private DML frameworks preserve privacy by perturbing the intermediate results shared by servers. Since each intermediate result is computed with users’ original data, its release will disclose part of private information, implying that the privacy loss may increase as iterations proceed. Moreover, these private DML frameworks only provide the same privacy guarantee for all data pieces. In addition to intermediate information perturbation, original data randomization methods can be combined to provide heterogeneous privacy protection. However, such an approach introduces coupled uncertainties into the classification model. The lack of uncertainty decoupling methods leads to the performance quantification a challenging task.
In this paper, we propose a privacy-preserving distributed machine learning (PDML) framework to settle these challenging issues. After removing the trustworthy servers assumption, we incorporate the users’ data reporting into the DML process, which forms a two-phase training scheme together with the distributed computing process. For privacy preservation, we adopt different approaches in the two phases. In Phase 1, a user first leverages a local randomization approach to obfuscate the most sensitive data pieces and sends the randomized version to a server. This technique provides the user with self-controlled privacy guarantee for the most sensitive information. Further, in Phase 2, multiple servers collaboratively train a common prediction model and there, they use a combined noise-adding method to perturb the communicated messages, which preserves privacy for users’ less sensitive data pieces. Also, such perturbation strengthens the privacy preservation of data pieces with the highest sensitive level. For the performance of the PDML framework, we analyze the generalization error of current classifiers trained by different servers.
The main contributions of this paper are threefold:
A two-phase PDML framework is proposed to provide heterogeneous privacy protection in DML, where users’ data pieces obtain different privacy guarantees depending on their sensitive levels and servers’ trust degrees. 2. 2.
In Phase 1, we design a local randomization approach, which preserves DP for the users’ most sensitive information. In Phase 2, a combined noise-adding method is devised to compensate the privacy protection of other data pieces. 3. 3.
The convergence property of the proposed ADMM-based privacy-aware algorithm is analyzed. We also give a theoretical bound of the difference between the generalization error of trained classifiers and the ideal optimal classifier.
The remainder of this paper is organized as follows. Related works are discussed in Section II. We provide some preliminaries and formulate the problem in Section III. Section IV presents the designed privacy-preserving framework, and the performance is analyzed in Section V. In order to validate the classification performance, we use multiple real-world datasets and conduct experiments in Section VI. Finally, Section VII concludes the paper. A preliminary version [21] of this paper was accepted for presentation at IEEE CDC 2019. This paper contains a different privacy-preserving approach with a fully distributed ADMM setting, full proofs of the main results, and more experimental results.
II Related Works
As one of the important applications of distributed optimization, DML has received widespread attentions from researchers. Besides ADMM schemes, many distributed approaches have been proposed in the literature, e.g., subgradient descent methods [22], local message-passing algorithms [23], adaptive diffusion mechanisms [24], and dual averaging approaches [25]. Compared with these approaches, ADMM schemes achieve faster empirical convergence [26], making it more suitable for large-scale DML tasks.
For privacy-preserving problems, cryptographic techniques [27, 28, 29] are often used to protect information from being inferred when the key is unknown. In particular, homomorphic encryption methods [28], [29] allow untrustworthy servers to calculate with encrypted data, and this approach has been applied in an ADMM scheme [20]. Nevertheless, such schemes unavoidably bring extra computation and communication overheads. Another commonly used approach to preserve privacy is random value perturbation [6], [30], [31]. DP has been increasingly acknowledged as the de facto criterion for non-encryption-based data privacy. This approach requires less costs but still provides strong privacy guarantee, though there exist tradeoffs between privacy and performance [7].
In recent years, random value perturbation-based approaches have been widely used to address privacy protection in distributed computing, especially in consensus problems [32]. For instance, [7, 8, 9], [11, 12, 13] provide privacy-preserving average consensus paradigms, where the mechanisms in [7, 8, 9] provide DP guarantee. Moreover, for a maximum consensus algorithm, [10] gives a differentially private mechanism. Since these solutions mainly focus on simple statistical analysis (e.g., computation of average and maximum elements), there may exist difficulties in directly applying them to DML.
Privacy-preserving machine learning problems have also attracted a lot of attention recently. Under centralized scenarios, Chaudhuri et al. [14] proposed a DP solution for an empirical risk minimization problem by perturbing the objective function with well-designed noise. For privacy-aware DML, Han et al. [33] also gave a differentially private mechanism, where the underlying distributed approach is subgradient descent. The works [15] and [16] present dynamic DP schemes for ADMM-based DML, where privacy guarantee is provided in each iteration. However, if a privacy violator uses the published information in all iterations to make inference, there will be no privacy guarantee. In addition, an obfuscated stochastic gradient method via correlated perturbations was proposed in [17], though it cannot provide DP preservation. Different from these works, in this paper we remove the trustworthy servers assumption. Moreover, we take into consideration the distinct sensitive levels of data pieces and the diverse trust degrees of servers, and propose the PDML framework providing heterogeneous privacy preservation.
III Preliminaries and Problem Statement
In this section, we introduce the overall computation framework of DML and the ADMM algorithm used there. Moreover, the privacy-preserving problem for the framework is formulated with the definition of local differential privacy.
III-A System Setting
We consider a collaborative DML framework to carry out classification problems based on data collected from a large number of users. Fig. 1 gives a schematic diagram. There are two parties involved: Users (or data contributors) and computing servers. The DML’s goal is to train a classification model based on data of all users. It has two phases of data collection and distributed computation, called Phase 1 and Phase 2, respectively. In Phase 1, each user sends his/her data to the server, which is responsible to collect all the data from the user’s group. In Phase 2, each computing server utilizes a distributed computing approach to cooperatively train the classifier through information interaction with other servers. The proposed DML framework is based on the one in [7], but the learning tasks are much more complex than the basic statistical analysis considered by [7].
Network Model. Consider computing servers participating in the framework where the th server is denoted by . We use an undirected and connected graph to describe the underlying communication topology, where is the servers set and is the set of communication links between servers. The number of communication links in is denoted by , i.e., . Let the set of neighbor servers of be . The degree of server is denoted by .
Different servers collect data from different groups of users, and thus all users can be divided into distinct groups. The th group of users, whose data is collected by server , is denoted by the set , and is the number of users in . Each user has a data sample , which is composed of a feature vector and the corresponding label . In this paper, we consider a binary-classification problem. That is, there are two types of labels as . Suppose that all data samples , are drawn from an underlying distribution , which is unknown to the servers. Here, the learning goal is that the classifier trained with limited data samples can match the ideal model trained with known as much as possible.
III-B Classification Problem and ADMM Algorithm
We first introduce the classification problem solved by the two-phase DML framework. Let be the trained classification model. The trained classifier should guarantee that the accuracy of mapping any feature vector (sampled from the distribution ) to its correct label is high. We employ the method of regularized empirical risk minimization, which is a commonly used approach to find an appropriate classifier [34]. Denote the classifier trained by server as . The objective function (or the empirical risk) of the minimization problem is defined as
[TABLE]
where is the loss function measuring the performance of the trained classifier . The regularizer is introduced to mitigate overfitting, and is a constant. We take a bounded classifier class such that . For the loss function and the regularizer , we introduce the following assumptions [14] [15].
Assumption 1**.**
The loss function is convex and doubly differentiable in . In particular, , and are bounded over the class as
[TABLE]
where , and are positive constants. Moreover, it holds .
Assumption 2**.**
The regularizer is doubly differentiable and strongly convex with parameter , i.e., ,
[TABLE]
where indicates the gradient with respect to .
We note that in (1) can be separated into different parts, where each part is the objective function of the local minimization problem to be solved by each server. The objective function of server is
[TABLE]
Since is trained based on the data of the th group of users, it may only partially reflect data characteristics. To find a common classifier taking account of all participating users, we place a global consensus constraint in the minimization problem, as . However, since we use a connected graph to describe the interaction between servers, we have to utilize a local consensus constraint:
[TABLE]
where is an auxiliary variable enforcing consensus between neighbor servers and . Obviously, (4) also implies global consensus. We can now write the whole regularized empirical risk minimization problem as follows [35].
Problem 1**.**
[TABLE]
Next, we establish a compact form of Problem 1. Let and be vectors aggregating all classifiers and auxiliary variables , respectively. To transfer all local consensus constraints into a matrix form, we introduce two block matrices , which are partitioned into submatrices with dimension . For the communication link , if is the th block of , then the th submatrix of and th submatrix of are the identity matrix ; otherwise, these submatrices are the zero matrix . We write , , and . Then, Problem 1 can be written in a compact form as
[TABLE]
For solving this problem we introduce the fully distributed ADMM algorithm from [26]. The augmented Lagrange function associated with (7) and (8) is given by , where is the dual variable ( is correspondingly called the primal variable) and is the penalty parameter.
At iteration , the solved optimal auxiliary variable satisfies the relation . Through some simple transformation, we have . Let with . If we set the initial value of to , we have . Thus, we can obtain the complete dual variable by solving . Let
[TABLE]
Define a new dual variable . Through the simplification process in [26], we obtain the fully distributed ADMM for solving Problem 1, which is composed of the following iterations:
[TABLE]
Note that is also a compact vector of all local dual variables for , i.e., .
The above ADMM iterations can be separated into different parts, which are solved by the different servers. At iteration , the information used by server to update a new primal variable includes users’ data , current classifiers and dual variable . The local augmented Lagrange function associated with the primal variable update is given by
[TABLE]
At each iteration, server will update its primal variable and dual variable as follows:
[TABLE]
Clearly, in (9) and (10), the information communicated between computing servers is the newly updated classifiers.
III-C Privacy-preserving Problem
In this subsection, we introduce the privacy-preserving problem in the DML framework. The private information to be preserved is first defined, followed by the introduction of privacy violators and information used for privacy inference. Further, we present the objectives of the two phases.
Private information. For users, both the feature vectors and the labels of the data samples contain their sensitive information. The private information contained in the feature vectors may be the ID, gender, general health data and so on. However, the labels may indicate, for example, whether a patient contracts a disease (e.g., HIV-1 infected) or whether a user has a special identity (e.g., a member of a certain group). We can see that compared with the feature vectors, the labels may be more sensitive for the users. In this paper, we consider that the labels of users’ data are the most sensitive information, which should be protected with priority and obtain stronger privacy guarantee than that of feature vectors.
Privacy attacks. All computing servers are viewed as untrustworthy potential privacy violators desiring to infer the sensitive information contained in users’ data. In the meantime, different servers present distinct trust degrees to users. User divides the potential privacy violators into two types. The server , collecting user ’s data directly, is the first type. Other servers , having no direct connection with user , are the second type. Compared with server , other servers are less trustworthy for user . To conduct privacy inference, the first type of privacy violators leverages user ’s reported data while the second type can utilize only the intermediate information shared by servers.
Privacy protections in Phases 1 & 2. Since the label of user is the most sensitive information, its original value should not be disclosed to any servers including server . Thus, during the data reporting process in Phase 1, user must obfuscate the private label in his/her local device. For the less sensitive feature vector, considering that server is more trustworthy, user can choose to transmit the original version to that server. Nevertheless, the user is still unwilling to disclose the raw feature vector to servers with lower trust degrees. Hence, in this paper, when server interacts with other servers to find a common classifier in Phase 2, the released information about user ’s data will be further processed before communication.
More specifically, in Phase 1, to obfuscate the labels, we use a local randomization approach, whose privacy-preserving property will be measured by local differential privacy (LDP) [30]. LDP is developed from differential privacy (DP), which is originally defined for trustworthy databases to publish aggregated private information [6]. The privacy preservation idea of DP is that for any two neighbor databases differing in one record (e.g., one user selects to report or not to report his/her data to the server) as input, a randomized mechanism is adopted to guarantee the two outputs to have high similarity so that privacy violators cannot identify the different record with high confidence. Since there is no trusted server for data collection in our setting, users locally perturb their original labels and report noisy versions to the servers.
To this end, we define a randomized mechanism , which takes a data sample as input and outputs its noisy version. The definition of LDP is given as follows.
Definition 1**.**
(-LDP). Given , a randomized mechanism preserves -LDP if for any two data samples and satisfying and , and any observation set , it holds
[TABLE]
In (11), the parameter is called the privacy preserving degree (PPD), which describes the strength of privacy guarantee of . A smaller implies stronger privacy guarantee. This is because smaller means that the two outputs and are closer, making it more difficult for privacy violators to infer the difference in and (i.e., and ).
III-D System Overview
In this paper, we propose the PDML framework, where users can obtain heterogeneous privacy protection. The heterogeneity is characterized by two aspects: i) When a user faces a privacy violator, his/her data pieces with distinct sensitive levels (i.e., the feature vector and the label) obtain different privacy guarantees; ii) for one type of private data piece, the privacy protection provided by the framework is stronger against privacy violators with low trust degrees than those with higher trust degrees. Particularly, in our approach, the privacy preservation strength of users’ labels is controlled by the users. Moreover, a modified ADMM algorithm is proposed to meet the heterogeneous privacy protection requirement.
The workflow of the proposed PDML framework is illustrated in Fig. 2. Some details are explained below.
In Phase 1, a user first appropriately randomizes the private label, and then sends the noisy label and the original feature vector to a computing server. The randomization approach used here determines the PPD of the label. 2. 2.
In Phase 2, multiple computing servers collaboratively train a common classifier based on their collected data. To protect privacy of feature vectors against less trustworthy servers, we further use a combined noise-adding method to perturb the ADMM algorithm, which also strengthens the privacy guarantee of users’ labels. 3. 3.
The performance of the trained classifiers is analyzed in terms of their generalization errors. To decompose the effects of uncertainties introduced in the two phases, we modify the loss function in Problem 1. We finally quantify the difference between the generalization error of trained classifiers and that of the ideal optimal classifier.
IV Privacy-Preserving Framework Design
In this section, we introduce the privacy-preserving approaches used in Phases 1 and 2, and analyze their properties.
IV-A Privacy-Preserving Approach in Phase 1
In this subsection, we propose an appropriate approach used in Phase 1 to provide privacy preservation for the most sensitive labels. In particular, it is controlled by users and will not be weakened in Phase 2.
We adopt the idea of randomized response (RR) [30] to obfuscate the users’ labels. Originally, RR was used to set plausible deniability for respondents when they answer survey questions about sensitive topics (e.g., HIV-1 infected or uninfected). When using RR, respondents only have a certain probability to answer questions according to their true situations, making the server unable to determine with certainty whether the reported answers are true.
In our setting, user randomizes the label through RR and sends the noisy version to server . This is done by the randomized mechanism defined below.
Definition 2**.**
For , the randomized mechanism with input data sample is given by , where
[TABLE]
In the above definition, is the randomization probability controlling the level of data obfuscation. Obviously, a larger implies higher uncertainty on the reported label, making it harder for the server to learn the true label.
Denote the output as , i.e., . After the randomization, will be transmitted to the server. In this case, server can use only to train the classifier, and the released information about the true label in Phase 2 is computed based on . This implies that once is reported to the server, no more information about the true label will be released. In this paper, we set the randomization probability in (12) as
[TABLE]
where . The following theorem gives the privacy-preserving property of the randomized mechanism in Definition 2, justifying this choice of from the viewpoint of LDP.
Proposition 1**.**
Under (13), the randomized mechanism preserves -LDP for .
The proof can be found in Appendix -A.
Proposition 1 clearly indicates that the users can tune the randomization probability according to their privacy demands. This can be seen as given a randomization probability , by (13), the PPD provided by is . Obviously, a larger randomization probability leads to smaller PPD, indicating stronger privacy guarantee.
If all data samples , drawn from the distribution are randomized through , the noisy data , can be considered to be obtained from a new distribution , which is related to the PPD . Note that is also an unknown distribution due to the unknown .
IV-B Privacy-Preserving Approach in Phase 2
To deal with less trustworthy servers in Phase 2, we devise a combined noise-adding approach to simultaneously preserve privacy for users’ feature vectors and enhance the privacy guarantee of users’ labels. We first adopt the method of objective function perturbation [14]. That is, before solving Problem 1, the servers perturb the objective function with random noises. For server , the perturbed objective function is given by
[TABLE]
where is the local objective function given in (3), and is a bounded random noise with arbitrary distribution. Let be the bound of noises , namely, . Denote the sum of as .
Limitation of objective function perturbation. We remark that in our setting, the objective function perturbation in (14) is not sufficient to provide reliable privacy guarantee. This is because each server publishes current classifier multiple times and each publication utilizes users’ reported data. Note that in the more centralized setting of [14], the classifier is only published once. More specifically, according to (9), is the solution to . In this case, it holds . As (10) shows, the dual variable can be deduced from updated classifiers. Thus, if ’s neighbor servers have access to and , then they can easily compute .
We should highlight that multiple releases of increase the risk of users’ privacy disclosure. This can be explained as follows. First, note that , where contains users’ private information. The goal of -perturbation is to protect not to be derived directly by other servers. However, after publishing an updated classifier , server releases a new gradient . Since the noise is fixed for all iterations, each release of means disclosing more information about . In particular, we have . That is, the effect of the added noise can be cancelled by integrating the gradients of objective functions at different time instants.
Modified ADMM by primal variable perturbation. To ensure appropriate privacy preservation in Phase 2, we adopt an extra perturbation method, which sets obstructions for other servers to obtain the gradient . Specifically, after deriving classifier , server first perturbs with a Gaussian noise whose variance is decaying as iterations proceed, and then sends a noisy version of to neighbor servers. This is denoted by , where with decaying rate .
The local augmented Lagrange function associated with -perturbed objective function in (14) is given by
[TABLE]
We then introduce the perturbed version of the ADMM algorithm in (9) and (10) as
[TABLE]
At iteration , a new classifier is first obtained by solving . Then, server will send out and wait for the updated classifiers from neighbor servers. At the end of an iteration, the server will update the dual variable .
IV-C Discussions
We now discuss the effectiveness of the primal variable perturbation. It is emphasized that at each iteration, only releases a small amount of information about through the communicated . Although and are known to ’s neighbors, cannot be directly computed due to the unknown . More specifically, observe that by (15), we have , where is the degree of .
On the other hand, using available information, other servers can compute only , i.e., the gradient with respect to perturbed classifier . We have . Thus, we obtain . Hence, due to , it would not be helpful for inferring to integrate the gradients of the objective functions at different iterations.
We should also observe that since , can be derived when . Moreover, it is clear that the relation holds for . However, is the result of under -perturbation. Moreover, due to the local consensus constraint (4), the trained classifiers may not have significant differences when . Such limited information is not sufficient for privacy violators to infer with high confidence.
Differential privacy analysis. We remark that in our scheme, the noise added to the objective function provides underlying privacy protection in Phase 2. Even if privacy violators make inference with published in all iterations, the disclosed information is users’ reported data plus extra noise perturbation. If the objective function perturbation is removed, the primal variable perturbation method cannot provide DP guarantee when . It is proved in [15] and [16] that the -perturbation in (16) preserves dynamic DP. According to the composition theorem of DP [6], the PPD will increase (indicating weaker privacy guarantee) when other servers obtain the perturbed classifiers of multiple iterations. In particular, if the perturbed classifiers in all iterations are used for inference, the PPD will be , implying no privacy guarantee any more.
Remark 1**.**
The objective function perturbation given in (14) preserves the so-called -DP [36]. Also, according to [14], the perturbation in (14) preserves -DP if has density with normalizing parameter . Note that the noise with this density is not bounded, which is not consistent with our setting. Although we use a bounded noise, this kind of perturbation still provides -DP guarantee, which is a relaxed form of pure -DP.
Strengthened privacy guarantee. For users’ labels, the privacy guarantee in Phase 2 is stronger than that of Phase 1. Since differential privacy is immune to post-processing [6], the PPD in Phase 1 will not increase during the iterations of the ADMM algorithm executed in Phase 2. However, such immunity is established based on a strong assumption that there is no limit to the capability of privacy violators. In our considered problem, this assumption is satisfied when all servers can have access to user ’s reported data , which may not be realistic. Hence, in our problem setting, one server (i.e., server ) obtains while other servers can access only the classifiers trained with users’ reported data.
Remark 2**.**
The -DP guarantee is provided for users’ feature vectors. Thus, in Phase 2, the sensitive information in those vectors is not disclosed much to the servers with lower trust degrees. For the labels, they obtain extra -DP preservation in Phase 2. Since the privacy-preserving scheme in Phase 1 preserves -DP for the labels, the released information about them in Phase 2 provides stronger privacy guarantee under the joint effect of -DP in Phase 1 and -DP in Phase 2. We will investigate the joint privacy-preserving degree in the future.
V Performance Analysis
In this section, we analyze the performance of the classifiers trained by the proposed PDML framework. Note that three different uncertainties are introduced into the ADMM algorithm, and these uncertainties are coupled together. The difficulty in analyzing the performance lies in decomposing the effects of the three uncertainties and quantifying the role of each uncertainty. Further, it is also challenging to achieve perturbations mitigation on the trained classifiers, especially to mitigate the influence of users’ wrong labels.
Here, we first give the definition of generalization error as the metric on the performance of the trained classifiers. Then, we establish a modified version of the loss function , which simultaneously achieves uncertainty decomposition and mitigation of label obfuscation. We finally derive a theoretical bound for the difference between the generalization error of trained classifiers and that of the ideal optimal classifier.
V-A Performance Metric
To measure the quality of trained classifiers, we use generalization error for analysis, which describes the expected error of a classifier on future predictions [37]. Recall that users’ data samples are drawn from the unknown distribution . The generalization error of a classifier is defined as the expectation of ’s loss function with respect to as . Further, define the regularized generalization error by
[TABLE]
We denote the classifier minimizing as , i.e., . We call the ideal optimal classifier.
Here, is the reference regularized generalization error under the classifier class and the used loss function . The trained classifier can be viewed as a good predictor if it achieves generalization error close to . Thus, as the performance metric of the classifiers, we use the difference between the generalization error of trained classifiers and . The difference is denoted as , that is, .
Furthermore, to measure the performance of the classifiers trained by different servers at multiple iterations, we introduce a comprehensive metric. First, considering that the classifiers solved by server at different iterations may be different until the consensus constraint (4) is satisfied, we define a classifier to aggregate in the first rounds as , where is the obtained classifier by solving (15). Moreover, due to the diversity of users’ reported data, the classifiers solved by different servers may also differ (especially in the initial iterations). For this reason, we will later study the accumulated difference among the servers, that is, .
V-B Modified Loss Function in ADMM Algorithm
To mitigate the effect of label obfuscation executed in Phase 1, we make some modification to the loss function in Problem 1. We use the noisy labels and the corresponding PPD in Phase 1 to adjust the loss function in (5). (Note that other parts of Problem 1 are not affected by the noisy labels.) Define the modified loss function by
[TABLE]
This function has the following properties.
Proposition 2**.**
- (i)
* is an unbiased estimate of as*
[TABLE] 2. (ii)
* is Lipschitz continuous with Lipschitz constant*
[TABLE]
where is the bound of given in Assumption 1.
The proof can be found in Appendix -B.
Now, we make server use in (19) as the loss function. Thus, the objective function in (3) must be replaced with the one as follows:
[TABLE]
Similar to in (1), we denote the objective function with the modified loss function as . Then, the following lemma holds, whose proof can be found in Appendix -C.
Lemma 1**.**
If the loss function and the regularizer satisfy Assumptions 1 and 2, respectively, then is -strongly convex.
To simplify the notation, let . With the objective function , the whole optimization problem for finding a common classifier can be stated as follows:
Problem 2**.**
[TABLE]
Lemma 2**.**
Problem 2 has an optimal solution set such that .
Lemma 2 can be proved directly from Lemma 1 in [35], whose condition is satisfied by Lemma 1.
We finally arrive at stating the optimization problem to be solved in this paper. To this end, for the modified objective function in (22), we define the perturbed version as in (14) by . Then, the whole objective function becomes
[TABLE]
The problem for finding the classifier with randomized labels and perturbed objective functions is as follows:
Problem 3**.**
[TABLE]
For , we have the following lemma showing its convexity properties.
Lemma 3**.**
* is -strongly convex. If satisfies that , then has a -Lipschitz continuous gradient, where is the bound of given in Assumption 1.*
The proof can be found in Appendix -D. For simplicity, we denote the Lipschitz continuous gradient of as , namely, .
We now observe that Problem 3 associated with the objective function has an optimal solution set where
[TABLE]
In fact, this can be shown by an argument similar to Lemma 2, where Lemma 3 establishes the convexity of the objective function (as in Lemma 1).
V-C Generalization Error Analysis
In this subsection, we analyze the the accumulated difference between the generalization error of trained classifiers and , i.e., . For the analysis, we use the technique from [38], which considers the problem of ADMM learning in the presence of erroneous updates. Here, our problem is more complicated because besides the erroneous updates brought by primal variable perturbation, there is also uncertainty in the training data and the objective functions. All these uncertainties are coupled together, which brings extra challenges for performance analysis.
We first decompose in terms of different uncertainties. To do so, we must introduce a new regularized generalization error associated with the modified loss function and the noisy data distribution . Similar to (18), for a classifier , it is defined by
[TABLE]
According to Proposition 2, is an unbiased estimate of . Thus, it is straightforward to obtain the following lemma, whose proof is omitted.
Lemma 4**.**
For a classifier , we have .
Now, we can decompose as follows:
[TABLE]
We will analyze each term in the far right-hand side of (24). The term describes the difference between the classifier and the optimal solution to Problem 3. Before analyzing this difference, we first consider the deviation between the perturbed classifier and , and a bound for it can be obtained by [38].
Here, we introduce some notations related to the bound. Let the compact forms of vectors be , , and . Also, let , , and . An auxiliary sequence is defined as with Q:=\bigl{(}\frac{L_{-}}{2}\bigr{)}^{\frac{1}{2}} [39]. has an optimal value , which is the solution to the equation .
Further, we define some important parameters to be used in the next lemma. The first two parameters, and , are related to the underlying network topology and will be used to establish convergence property of the perturbed ADMM algorithm. Let , where and denote the maximum and minimum nonzero eigenvalues of a matrix, respectively. Also, we define and with constant as
[TABLE]
Then, we have the following lemma from [38], which gives a bound for .
Lemma 5**.**
Suppose that the conditions of Lemma 3 hold. If the parameters and can be chosen such that
[TABLE]
Take in (17) as , where with , and . Then, it holds
[TABLE]
where , and , .
Lemma 5 implies that given a connected graph and the objective function in Problem 3, if the parameters and satisfy (25), then in (26) is guaranteed to be less than 1. In this case, the obtained classifiers will converge to the neighborhood of the optimal solution , where the radius of the neighborhood is . The modified ADMM algorithm can achieve different radii depending on the added noises . Since many parameters are involved, to meet the condition (25) may not be straightforward. In order to make smaller to achieve better convergence rate, in addition to the parameters, one may change, for example, the graph to make the value smaller.
Theorem 1 to be stated below gives the upper bound of the accumulated difference in the sense of expectation. In the theorem, we employ the important concept of Rademacher complexity [40]. It is defined on the classifier class and the collected data used for training, that is, , where are independent random variables drawn from the Rademacher distribution, i.e., for . In addition, we use the notation to denote the norm of a vector with a positive definite matrix , i.e., .
Theorem 1**.**
Suppose that the conditions in Lemma 5 are satisfied and the decaying rate of noise variance is set as . Then, for and , the aggregated classifier obtained by the privacy-aware ADMM scheme (15)-(17) satisfies with probability at least
[TABLE]
where , and the parameters , , and are found in Lemma 5.
Proof.
In what follows, we evaluate the terms in the far right-hand side of (24) by dividing them into three groups. The first is the terms . We can bound them from above as
[TABLE]
According to Theorem 26.5 in [40], with probability at least , we have
[TABLE]
where is the Rademacher complexity of with respect to . Further, by the contraction lemma in [40],
[TABLE]
where we have used Proposition 2. Also, from (19), we derive
[TABLE]
where is the bound of the original loss function (Assumption 1). Then, it follows that
[TABLE]
The second group in (24) are the terms about and . In their aggregated forms, by Lemma 2, it holds
[TABLE]
where we have used Jensen’s inequality given the strongly convex . For the first two terms in (29), by Theorem 1 of [38], we have
[TABLE]
Take the expectation on both sides of (30) with respect to . Given , we derive
[TABLE]
where we used and . Thus, it follows that
[TABLE]
Then, for (30), we arrive at
[TABLE]
Next, we focus on the latter two terms in (29). Due to (23), we have , which yields
[TABLE]
By Lemma 7 in [14], we obtain . It follows
[TABLE]
where is the bound of noise . Substituting (31) and (32) into (29), we derive
[TABLE]
The third group in (24) is the term . We have
[TABLE]
Taking the expectation with respect to , we obtain
[TABLE]
By Lemma 5, we have
[TABLE]
Then, it follows that
[TABLE]
where we have used . Substituting (28), (33) and (34) into (24), we arrive at the bound in (27). ∎
Theorem 1 provides a guidance for both users and servers to obtain a classification model with desired performance. In particular, the effects of three uncertainties on the bound of have been successfully decomposed. Note that these effects are not simply superimposed but coupled together. Specifically, the terms in (27) related to the primal variable perturbation decrease with iterations at the rate of . This also implies that the whole framework achieves convergence in expectation at this rate.
Compared with [16] and [38], where bounds of are provided, we derive the difference between the generalization error of the aggregated classifier and that of the ideal optimal classifier , which is moreover given in a closed form. The bound in (27) contains the effect of the unknown data distribution while the bound of covers only the role of existing data. Although [15] also considers the generalization error of found classifiers, no closed form of the bound is given, and the obtained bound may not decrease with iterations since the reference classifier therein is not but a time-varying one. In the more centralized setting of [14], is analyzed for the derived classifier , but there is no convergence issue since is perturbed and published only once.
Moreover, different from the works [14, 15, 16] and [38], our analysis considers the effects of the classifier class by Rademacher complexity. Such effects have been used in [40] in non-private centralized machine learning scenarios. Furthermore, in the privacy-aware (centralized or distributed) frameworks of [14, 15, 16] and the robust ADMM scheme for erroneous updates [38], there is only one type of noise perturbation, and the uncertainty in the training data is not considered.
V-D Comparisons and Discussions
Here, we compare the proposed framework with existing schemes from the perspective of privacy and performance, and discuss how each parameter contributes to the results.
First, we find that the bound in (27) is larger than those in [14, 15, 16] if we adopt the approach in this paper to conduct performance analysis on these works. This is obvious since there are more perturbations in our setting. However, as we have discussed in Section IV-C, these existing frameworks do not meet the heterogeneous privacy requirements, and some of them cannot avoid accumulation of privacy losses, resulting in no protection at all. It should be emphasized that extra performance costs must be paid when the data contributors want to obtain stronger privacy guarantee. These existing frameworks may be better than ours in the sense of performance, but the premise is that users accept the privacy preservation provided by them. If users require heterogenous privacy protection, our framework can be more suitable.
Further, compared with [14, 15, 16], [38] and [40], we provide a more systematic result on the performance analysis in Theorem 1, where most parameters related to useful measures of classifiers (also privacy preservation) are included. Servers and users can set these parameters as needed, and thus obtain classifiers which can appropriately balance the privacy and the performance. We will discuss the roles of these parameters after some further analysis on the theoretical result.
According to Lemma 5, the classifiers solved by different servers converge to in the sense of expectation. The performance of can be analyzed in a similar way as in Theorem 1. This is given in the following corollary.
Corollary 1**.**
For and , with probability at least , we have
[TABLE]
For the sake of comparison, the next theorem provides a performance analysis when the privacy-preserving approach in Phase 2 is removed, and a corresponding result on the bound of is given in the subsequent corollary.
Theorem 2**.**
For and , the aggregated classifier obtained by the original ADMM scheme (9) and (10) satisfies with probability at least
[TABLE]
Corollary 2**.**
For and , with probability at least , we have
[TABLE]
It is observed that the bound in (36) is not in expectation since there is no noise perturbation during the ADMM iterations. It is interesting to note that the convergence rate of the unperturbed ADMM algorithm is also . This implies that the modified ADMM algorithm preserves the convergence speed of the general distributed ADMM scheme.
However, there exists a tradeoff between performance and privacy protection. Comparing (27) and (36), we find that the extra terms in (27) are the results of perturbations in Phase 2. Also, the effect of the objective function perturbation is reflected in (35), that is, the term . When (the bound of ) increases, the generalization error of the trained classifier would increase as well, indicating worse performance. Similarly, if we use noises with larger initial variances and decaying rates to perturb the solved classifiers in each iteration, the bound in (27) will also increase.
Effect of data quality. We observe that the bound of in (37) also appears in (27), (35) and (36). This bound reflects the effect of users’ reported data, whose labels are randomized in Phase 1. It can be seen that besides the probability , the bound in (37) is affected by three factors: PPD , Rademacher complexity , and the number of data samples . Here, we discuss the roles of these factors.
For the effect of PPD, we find that when is small, the bound will decrease with an increase in . However, when is sufficiently large, it has limited influence on the bound. In particular, by taking , the bound reduces to that for the optimal solution of Problem 1, where goes to 1 in (37). Note that and still remain and affect the performance.
For the effect of , we observe that the generalization errors of trained classifiers may become larger when increases. The Rademacher complexity is directly related to the size of the classifier class . If there are only a small number of candidate classifiers in , the solutions have a high probability of obtaining smaller deviation between their generalization errors and the reference generalization error . Nevertheless, we should guarantee the richness of the class to make small since trained in terms of will have large generalization error. Though the deviation may be small, the trained classifiers are not good predictors due to the bad performance of . Thus, setting an appropriate classifier class is important for obtaining a classifier with qualified performance.
Finally, we consider the effect of the number of users. From the bound of in (37), we know that if becomes larger, the last term of the bound will decrease. In general, more data samples imply access to more information about the underlying distribution . Then, the trained classifier can predict the labels of newly sampled data from with higher accuracy. Moreover, it can be seen that the bound is the average of local errors generated in different servers. When new servers participate in the DML framework, these servers should make sure that they have collected sufficient amount of training data samples. Otherwise, the bound may not decrease though the total number of data samples increases. This is because unbalanced local errors may lead to an increase in their average, implying larger bound of .
VI Experimental Evaluation
In this section, we conduct experiments to validate the obtained theoretical results and study the classification performance of the proposed PDML framework. Specifically, we first use a real-world dataset to verify the convergence property of the PDML framework and study how key parameters would affect the performance. Also, we leverage another seven datasets to verify the classification accuracy of the classifiers trained by the framework.
VI-A Experiment Setup
VI-A1 Datasets
We use two kinds of publicly available datasets as described below to validate the convergence property and classification accuracy of the PDML.
(i) Adult dataset [41]. The dataset contains census data of 48,842 individuals, where there are 14 attributes (e.g., age, work-class, education, occupation and native-country) and a label indicating whether a person’s annual income is over 50,000 as 1, otherwise it is labeled as .
(ii) Gunnar Rätsch’s benchmark datasets [42]. There are thirteen data subsets from the UCI repository in the benchmark datasets. To mitigate the effect of data quality, we select seven datasets with the largest data sizes to conduct experiments. The seven datasets are German, Image, Ringnorm, Banana, Splice, Twonorm and Waveform, where the numbers of instances are 1,000, 2,086, 7,400, 5,300, 2,991, 7,400 and 5,000, respectively. Each dataset is partitioned into training and test data, with a ratio of approximately .
VI-A2 Underlying classification approach
Logistic regression (LR) is utilized for training the prediction model, where the loss function and regularizer are \ell_{LR}(y_{i,j},\mathbf{w}_{i}^{\mathrm{T}}\mathbf{x}_{i,j})=\log\bigl{(}1+e^{-y_{i,j}\mathbf{w}_{i}^{\mathrm{T}}\mathbf{x}_{i,j}}\bigr{)} and , respectively. Then, the local objective function is given by
[TABLE]
It is easy to check that when the classifier class is bounded (e.g., a bounded set ), satisfies Assumption 1. Due to the convexity property of , is strongly convex. Then, according to Lemma 2, Problems 2 and 3 have optimal solution sets, and thus, we can use LR to train the classifiers.
VI-A3 Network topology
We consider servers collaboratively train a prediction model. A connected random graph is used to describe the communication topology of the 10 servers. The used graph has communication links in total. Each server is responsible for collecting the data from a group of users, and thus there are 10 groups of users. In the experiments, we assume that each group has the same number of users, that is, . For example, we use instances sampled from the Adult dataset to train the classifier, and then each server collects data from users.
VI-B Experimental Results with Adult Dataset
Based on the Adult dataset, we first verify the convergence property of the PDML framework. Fig. 3(a) illustrates the maximum distances between the norms of arbitrary two classifiers found by different servers. We set the bound of to 1. Other settings are the same as those with experiments under the synthetic dataset. For the sake of comparison, we also draw the variation curve (with circle markers) of the maximum distance when the privacy-preserving approach in Phase 2 is removed. We observe that both distances converge to 0, implying that the consensus constraint is eventually satisfied.
Fig. 3(b) shows the variation of empirical risks (the objective function in (1)) as iterations proceed. Here, the green dashed line depicts the final empirical risk achieved by general ADMM with original data, which we call the reference empirical risk. There are also two curves showing varying empirical risks with privacy preservation. Comparing the two curves, we find that the ADMM with combined noise-adding scheme preserves the convergence property of the general ADMM algorithm. Due to the noise perturbations in Phase 2, the convergence time becomes longer. In addition, it can be seen that regardless of whether the privacy-preserving approach in Phase 2 is used, both ADMM schemes cannot achieve the same final empirical risks with that of the green line, which is consistent with the analysis in Section V-D.
We then study the effects of the key parameters on the performance. In Fig. 4(a), we examine the impact of the noise bound when the decaying rate is fixed at . It is observed that affects the final empirical risks of the trained classifiers. The larger the noise bound, the greater the gap between the achieved empirical risks and the reference value, which is reconciled with Corollary 1. In Fig. 4(b), we inspect the effect of Gaussian noise decaying rate when is fixed at . We find that the convergence time is affected by . A larger implies that the communicated classifiers are still perturbed by noises with larger variance even after iterating over multiple steps. Thus, more iterations are needed to obtain the same final empirical risk with that of smaller . Such a property can be derived from the bound in (27).
Fig. 4(c) illustrates the variation of final empirical risks when the PPD changes. The final empirical risks decrease with larger PPD (weaker privacy guarantee), which implies the tradeoff relation between the privacy protection and the performance. Further, the extra perturbations in Phase 2 lead to larger empirical risks for all the PPDs in the experiments. We also find that when is large (), the achieved empirical risks are close to the reference value, and do not significantly change. Again, the result is consistent with the analysis of the bound in (37).
VI-C Classification Accuracy Evaluation
We use the test data of the seven datasets to evaluate the prediction performance of the trained classifiers, which is shown in Table I. The classification accuracy is defined as the ratio that the labels predicted by the trained classifier match the true labels of test data. For comparison, we present the classification accuracy achieved by general ADMM with the original data. For validation of classification accuracy under the PDML framework, we choose six different sets of parameter configurations to conduct the experiments. The specific configurations can be found in the second row of Table I. We find that lager and smaller will generate better accuracy. According to the theoretical results, the upper bounds for the differences and will decrease with lager and smaller , implying better performance of the trained classifiers. Thus, the bound in Theorem 1 also provides a guideline to choose appropriate parameters to obtain a prediction model with satisfied classification accuracy.
It is impressive to observe that even under the strongest privacy setting (, ), the proposed framework achieves comparable classification accuracy to the reference precision. We also notice that under the datasets Banana and Splice, PDML achieves inferior accuracy in all settings. For a binary classification problem, it is meaningless to obtain a precision of around . The reason for the poor accuracy may be that LR is not a suitable classification approach for these two datasets. Overall, the proposed PDML framework achieves competitive classification accuracy on the basis of providing strong privacy protection.
VII Conclusion
In this paper, we have provided a privacy-preserving ADMM-based distributed machine learning framework. By a local randomization approach, data contributors obtain self-controlled DP protection for the most sensitive labels and the privacy guarantee will not decrease as ADMM iterations proceed. Further, a combined noise-adding method has been designed for perturbing the ADMM algorithm, which simultaneously preserves privacy for users’ feature vectors and strengthens protection for the labels. Lastly, the performance of the proposed PDML framework has been analyzed in theory and validated by extensive experiments.
For future investigations, we will study the joint privacy-preserving effects of the local randomization approach and the combined noise-adding method. Moreover, it is interesting while challenging to extend the PDML framework to the non-empirical risk minimization problems. When users allocate distinct sensitive levels to different attributes, we are interested in designing a new privacy-aware scheme providing heterogeneous privacy protections for different attributes.
-A Proof of Proposition 1
Let be the reported data of a user with arbitrary data sample drawn from . Then we have . Suppose that the user’s data sample has label , which is denoted by . By (12) and (13), the probability that the user reports to the server is
[TABLE]
Similarly, if the user’s original label is , i.e., , we have
[TABLE]
Then, we further have the relations as follows:
[TABLE]
With a slight abuse of notation, we view label “” as “0” below. Note that under this case, the observation set in Definition 1 is the user’s reported data . Then, for any with feature vector and arbitrary label , we have
[TABLE]
where we use the relation . ∎
-B Proof of Proposition 2
(i) According to (12), we have
[TABLE]
where we have used Proposition 1. Then, it follows that
[TABLE]
By (19), we obtain
[TABLE]
Substituting (39) into (38), we arrive at
[TABLE]
(ii) The derivative of with respect to is given by
[TABLE]
Then, we have
[TABLE]
This bound gives the Lipschitz constant of . ∎
-C Proof of Lemma 1
According to Assumption 2, is doubly differentiable. By Taylor’s Theorem, we have
[TABLE]
where and denote the gradient and the second-order gradient, respectively. Due to (2), we derive
[TABLE]
which implies . For , let , and then we have
[TABLE]
where we have used Assumption 1. This relation also implies that is convex. Then, we obtain , . It follows that
[TABLE]
Rearrange the above equation so that
[TABLE]
which indicates is -strongly convex. Since , it follows that is -strongly convex. ∎
-D Proof of Lemma 3
The strongly convex property of can be proved directly from Lemma 1. For the Lipschitz continuous gradient, we consider the compact form of classifiers, as . We have . The second derivative of with respect to is given by
[TABLE]
For , we have
[TABLE]
Due to , we derive
[TABLE]
This also gives the Lipschitz continuous gradient of . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in Proc. USENIX Secur. , 2014, pp. 17–32.
- 2[2] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proc. IMLS ICML , 2014, pp. 1188–1196.
- 3[3] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning for recommender systems,” in Proc. ACM SIGKDD , 2015, pp. 1235–1244.
- 4[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al. , “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. , vol. 3, no. 1, pp. 1–122, 2011.
- 5[5] Y. Qi, O. Tastan, J. G. Carbonell, J. Klein-Seetharaman, and J. Weston, “Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins,” Bioinformatics , vol. 26, no. 18, pp. 645–652, 2010.
- 6[6] C. Dwork, “Differential privacy: A survey of results,” in Proc. Int. Conf. Theor. Appl. Mod. Comput. , 2008, pp. 1–19.
- 7[7] X. Wang, J. He, P. Cheng, and J. Chen, “Privacy preserving collaborative computing: Heterogeneous privacy guarantee and efficient incentive mechanism,” IEEE Trans. Signal Proces. , vol. 67, no. 1, pp. 221–233, 2019.
- 8[8] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private average consensus: Obstructions, trade-offs, and optimal algorithm design,” Automatica , vol. 81, pp. 221–231, 2017.
