The Value of Collaboration in Convex Machine Learning with Differential Privacy
Nan Wu, Farhad Farokhi, David Smith, Mohamed Ali Kaafar

TL;DR
This paper analyzes how collaboration among multiple privacy-aware data owners affects the utility of convex machine learning models trained with differential privacy, providing predictive insights into privacy-utility trade-offs.
Contribution
It introduces a model to predict the impact of privacy parameters and dataset size on the quality of differentially-private machine learning models, validated on real financial and fraud detection datasets.
Findings
Model predicts privacy-utility trade-offs based on dataset size and privacy budget.
Validation confirms the accuracy of the performance prediction.
Collaboration benefits increase with larger datasets and higher privacy budgets.
Abstract
In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security
