Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in   Collaborative Machine Learning

Chia-Yuan Wu; Frank E. Curtis; Daniel P. Robinson

arXiv:2409.09532·cs.LG·January 24, 2025

Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

PDF

Open Access

TL;DR

This paper introduces a two-stage synthetic data approach for collaborative machine learning that enhances fairness, preserves client privacy, and reduces communication costs by avoiding iterative data exchanges.

Contribution

The proposed method generates synthetic datasets through bilevel optimization and differential privacy, eliminating the need for fairness-specific data transmission in distributed learning.

Findings

01

Reduces communication to a single round

02

Maintains data privacy through differential privacy

03

Promotes fair predictions in collaborative models

Abstract

In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection