Footprints of Data in a Classifier: Understanding the Privacy Risks and   Solution Strategies

Payel Sadhukhan; Tanujit Chakraborty

arXiv:2407.02268·cs.CR·April 15, 2025

Footprints of Data in a Classifier: Understanding the Privacy Risks and Solution Strategies

Payel Sadhukhan, Tanujit Chakraborty

PDF

Open Access

TL;DR

This paper explores how training data footprints in classifiers pose privacy risks, analyzes contributing factors, and proposes mitigation strategies and a privacy-performance trade-off index to balance privacy with model effectiveness.

Contribution

It provides a theoretical and empirical analysis of privacy vulnerabilities due to training data footprints and introduces mitigation techniques and a privacy-performance trade-off index.

Findings

01

Classifiers are universally vulnerable under data imbalance and distribution shifts.

02

Training data quality significantly affects classifier susceptibility to privacy risks.

03

Data obfuscation techniques can mitigate privacy risks but may impact classification performance.

Abstract

The widespread deployment of Artificial Intelligence (AI) across government and private industries brings both advancements and heightened privacy and security concerns. Article 17 of the General Data Protection Regulation (GDPR) mandates the Right to Erasure, requiring data to be permanently removed from a system to prevent potential compromise. While existing research primarily focuses on erasing sensitive data attributes, several passive data compromise mechanisms remain underexplored and unaddressed. One such issue arises from the residual footprints of training data embedded within predictive models. Performance disparities between test and training data can inadvertently reveal which data points were part of the training set, posing a privacy risk. This study examines how two fundamental aspects of classifier systems - training data quality and classifier training methodology -…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques