TL;DR
This paper investigates how various features of image datasets influence the effectiveness and security of privacy-preserving machine learning models, providing insights for optimizing privacy-utility trade-offs.
Contribution
It identifies key dataset characteristics that impact model utility and vulnerability, offering practical guidance for dataset selection and privacy strategies in PPML.
Findings
Imbalanced datasets increase vulnerability in minority classes, but DP mitigates this.
Fewer classes in datasets improve utility and privacy.
High entropy or low FDR datasets worsen the utility-privacy balance.
Abstract
Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
