Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning
Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

TL;DR
This paper introduces a two-stage synthetic data approach for collaborative machine learning that enhances fairness, preserves client privacy, and reduces communication costs by avoiding iterative data exchanges.
Contribution
The proposed method generates synthetic datasets through bilevel optimization and differential privacy, eliminating the need for fairness-specific data transmission in distributed learning.
Findings
Reduces communication to a single round
Maintains data privacy through differential privacy
Promotes fair predictions in collaborative models
Abstract
In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection
