Differential Privacy Under Class Imbalance: Methods and Empirical Insights
Lucas Rosenblatt, Yuliia Lut, Eitan Turok, Marco Avella-Medina, Rachel, Cummings

TL;DR
This paper explores the challenges of applying differential privacy to imbalanced classification problems, proposing and empirically evaluating various private data augmentation and learning techniques to address class imbalance.
Contribution
It formalizes the problem of differential privacy under class imbalance and introduces algorithmic solutions including DP variants of oversampling, synthetic data generation, and class-weighted learning.
Findings
Private synthetic data methods perform well as pre-processing.
Class-weighted ERMs are effective in high-dimensional settings.
Some existing imbalanced learning techniques are incompatible with differential privacy.
Abstract
Imbalanced learning occurs in classification settings where the distribution of class-labels is highly skewed in the training data, such as when predicting rare diseases or in fraud detection. This class imbalance presents a significant algorithmic challenge, which can be further exacerbated when privacy-preserving techniques such as differential privacy are applied to protect sensitive training data. Our work formalizes these challenges and provides a number of algorithmic solutions. We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance; these include oversampling, SMOTE, and private synthetic data generation. We also consider DP variants of in-processing techniques, which adjust the learning algorithm to account for the imbalance; these include model bagging, class-weighted empirical risk minimization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDispute Resolution and Class Actions · Discrimination and Equality Law · Law, Rights, and Freedoms
MethodsSynthetic Minority Over-sampling Technique.
