Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Lucas Lange; Maurice-Maximilian Heykeroth; Erhard Rahm

arXiv:2409.01329·cs.LG·February 3, 2026

Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Lucas Lange, Maurice-Maximilian Heykeroth, Erhard Rahm

PDF

1 Repo

TL;DR

This paper investigates how various features of image datasets influence the effectiveness and security of privacy-preserving machine learning models, providing insights for optimizing privacy-utility trade-offs.

Contribution

It identifies key dataset characteristics that impact model utility and vulnerability, offering practical guidance for dataset selection and privacy strategies in PPML.

Findings

01

Imbalanced datasets increase vulnerability in minority classes, but DP mitigates this.

02

Fewer classes in datasets improve utility and privacy.

03

High entropy or low FDR datasets worsen the utility-privacy balance.

Abstract

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luckyos-code/dataset-analysis-ppml
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.