Privacy-Preserving Debiasing using Data Augmentation and Machine   Unlearning

Zhixin Pan; Emma Andrews; Laura Chang; Prabhat Mishra

arXiv:2404.13194·cs.LG·April 23, 2024·1 cites

Privacy-Preserving Debiasing using Data Augmentation and Machine Unlearning

Zhixin Pan, Emma Andrews, Laura Chang, Prabhat Mishra

PDF

Open Access

TL;DR

This paper introduces a novel approach combining diffusion-based data augmentation and multi-shard machine unlearning to reduce bias and enhance privacy protection in machine learning models, demonstrating effectiveness across various datasets.

Contribution

It presents a new method that jointly addresses data bias and privacy concerns using diffusion-based augmentation and unlearning, with provable privacy guarantees.

Findings

01

Significant bias reduction achieved across datasets.

02

Enhanced robustness against privacy attacks.

03

Effective balance between fairness and privacy.

Abstract

Data augmentation is widely used to mitigate data bias in the training dataset. However, data augmentation exposes machine learning models to privacy attacks, such as membership inference attacks. In this paper, we propose an effective combination of data augmentation and machine unlearning, which can reduce data bias while providing a provable defense against known attacks. Specifically, we maintain the fairness of the trained model with diffusion-based data augmentation, and then utilize multi-shard unlearning to remove identifying information of original data from the ML model for protection against privacy attacks. Experimental evaluation across diverse datasets demonstrates that our approach can achieve significant improvements in bias reduction as well as robustness against state-of-the-art privacy attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data