Footprints of Data in a Classifier: Understanding the Privacy Risks and Solution Strategies
Payel Sadhukhan, Tanujit Chakraborty

TL;DR
This paper explores how training data footprints in classifiers pose privacy risks, analyzes contributing factors, and proposes mitigation strategies and a privacy-performance trade-off index to balance privacy with model effectiveness.
Contribution
It provides a theoretical and empirical analysis of privacy vulnerabilities due to training data footprints and introduces mitigation techniques and a privacy-performance trade-off index.
Findings
Classifiers are universally vulnerable under data imbalance and distribution shifts.
Training data quality significantly affects classifier susceptibility to privacy risks.
Data obfuscation techniques can mitigate privacy risks but may impact classification performance.
Abstract
The widespread deployment of Artificial Intelligence (AI) across government and private industries brings both advancements and heightened privacy and security concerns. Article 17 of the General Data Protection Regulation (GDPR) mandates the Right to Erasure, requiring data to be permanently removed from a system to prevent potential compromise. While existing research primarily focuses on erasing sensitive data attributes, several passive data compromise mechanisms remain underexplored and unaddressed. One such issue arises from the residual footprints of training data embedded within predictive models. Performance disparities between test and training data can inadvertently reveal which data points were part of the training set, posing a privacy risk. This study examines how two fundamental aspects of classifier systems - training data quality and classifier training methodology -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
