Production of Categorical Data Verifying Differential Privacy: Conception and Applications to Machine Learning
H\'eber H. Arcolezi

TL;DR
This paper investigates methods to improve privacy and utility in differentially private data collection and machine learning, proposing new solutions for frequency estimation and evaluating privacy-utility trade-offs in real-world ML models.
Contribution
The paper introduces novel approaches for enhancing utility in local differential privacy for multiple attributes and collections, and empirically assesses privacy-utility trade-offs in private machine learning models.
Findings
Proposed solutions outperform state-of-the-art LDP protocols.
Differentially private ML models achieve utility close to non-private models.
Validated the effectiveness of privacy-preserving methods through analytical and experimental results.
Abstract
Private and public organizations regularly collect and analyze digitalized data about their associates, volunteers, clients, etc. However, because most personal data are sensitive, there is a key challenge in designing privacy-preserving systems. To tackle privacy concerns, research communities have proposed different methods to preserve privacy, with Differential privacy (DP) standing out as a formal definition that allows quantifying the privacy-utility trade-off. Besides, with the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server. The objective of this thesis is thus two-fold: O) To improve the utility and privacy in multiple frequency estimates under LDP guarantees, which is fundamental to statistical learning. And O) To assess the privacy-utility trade-off of machine learning (ML) models trained over differentially private…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Probability and Risk Models
