Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000
Xiaobo Huang, Fang Xie

TL;DR
This paper introduces SAD-DPSGD, a novel method that adaptively adjusts noise and clipping thresholds during training to improve differential privacy performance on imbalanced medical datasets like HAM10000.
Contribution
We propose SAD-DPSGD, which employs a linear decay mechanism for noise and clipping thresholds to better handle imbalanced datasets in differentially private training.
Findings
SAD-DPSGD outperforms Auto-DPSGD on HAM10000.
Accuracy improves by 2.15% under privacy budget ε=3.0.
Method effectively mitigates gradient clipping issues in imbalanced data.
Abstract
When applying machine learning to medical image classification, data leakage is a critical issue. Previous methods, such as adding noise to gradients for differential privacy, work well on large datasets like MNIST and CIFAR-100, but fail on small, imbalanced medical datasets like HAM10000. This is because the imbalanced distribution causes gradients from minority classes to be clipped and lose crucial information, while majority classes dominate. This leads the model to fall into suboptimal solutions early. To address this, we propose SAD-DPSGD, which uses a linear decaying mechanism for noise and clipping thresholds. By allocating more privacy budget and using higher clipping thresholds in the initial training phases, the model avoids suboptimal solutions and enhances performance. Experiments show that SAD-DPSGD outperforms Auto-DPSGD on HAM10000, improving accuracy by 2.15% under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data
