Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping
Linzh Zhao (1), Aki Rehn (1), Mikko A. Heikkil\"a (1), Razane Tajeddine (2), Antti Honkela (1) ((1) Department of Computer Science, University of Helsinki, Finland, (2) Department of Electrical, Computer Engineering, American University of Beirut, Lebanon)

TL;DR
This paper introduces bounded adaptive clipping to improve fairness in differentially private learning by preventing excessive gradient suppression, leading to better accuracy for minority groups in skewed datasets.
Contribution
It proposes a novel bounded adaptive clipping method that maintains a lower bound on gradient clipping, enhancing fairness and accuracy in DP models.
Findings
Improves worst-class accuracy by over 10 percentage points on skewed datasets.
Reduces disparity caused by adaptive clipping in DP learning.
Outperforms constant clipping by over 5 percentage points.
Abstract
Differential privacy (DP) has become an essential framework for privacy-preserving machine learning. Existing DP learning methods, however, often have disparate impacts on model predictions, e.g., for minority groups. Gradient clipping, which is often used in DP learning, can suppress larger gradients from challenging samples. We show that this problem is amplified by adaptive clipping, which will often shrink the clipping bound to tiny values to match a well-fitting majority, while significantly reducing the accuracy for others. We propose bounded adaptive clipping, which introduces a tunable lower bound to prevent excessive gradient suppression. Our method improves the accuracy of the worst-performing class on average over 10 percentage points on skewed MNIST and Fashion MNIST compared to the unbounded adaptive clipping, and over 5 percentage points over constant clipping.
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper focuses on an important part of DP training. 2. The paper clearly identifies and illustrates a failure mode for unbounded adaptive clipping, where the bound collapses and ignores minorities.
1. The novel part of this paper is the max() function. This is a minor heuristic, not a new framework. 2. Baseline is not well selected. Why pick the auto clipping? My understanding is that auto clipping is good for hyperparameter tuning since it does not require for clip bound. Why do you want to compare your proposed method with them? I think De et al.(https://arxiv.org/pdf/2204.13650) may be a good choice. They achieve good performance on many datasets. If your method plus theirs can achieve
1. This manuscript first identifies a well-identified problem, via a failure mode of existing adaptive clipping methods, where clipping bounds collapse during training, leading to unfair outcomes. The toy example in Figure 1 is particularly effective in illustrating this issue. 2. Authors propose a simple and effective Solution, i.e. DP-HPO. Its bounded adaptive clipping is easy to implement, requires minimal modification to existing DP-SGD pipelines, and comes with a clear privacy guarantee. 3.
1. The DP-HPO introduce a new hyperparameter i.e. the lower-bound of adaptive clipping bound C_LB. The paper shows robustness, but provides limited guidance on principled selection., this could be a practical barrier. 2. The paper has a limited theoretical analysis about fairness. While motivated by fairness, the paper does not provide a theoretical analysis of how bounded clipping improves fairness guarantees (e.g., in terms of fairness definitions like equalized odds or demographic parity). 3.
The paper identifies a failure mode of earlier adaptive clipping methods and proposes a simple fix, with experiments demonstrating that it alleviates the issue. It's clearly written.
- Theorem 3.2 does not provide a precise privacy guarantee. The privacy–accuracy trade-off would be much clearer if the authors specified the resulting $\epsilon$ as an explicit function of $T$, the subsampling rate and the noise multipliers $\sigma_{grad}, \sigma_{count}$. In its current form, the guarantee is hard to interpret. - While the mean-estimation example is interesting, it seems specific. Is the failure primarily driven by the setup in which the minority group is strictly smaller tha
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpen Education and E-Learning
