Anonymous Learning via Look-Alike Clustering: A Precise Analysis of Model Generalization
Adel Javanmard, Vahab Mirrokni

TL;DR
This paper analyzes how using look-alike clustering for anonymous data affects model generalization, revealing it can serve as a regularizer and improve performance in high-dimensional settings.
Contribution
It provides a theoretical framework using CGMT to understand the impact of anonymous cluster centers on model generalization, supported by finite-sample experiments.
Findings
Training on anonymous cluster centers can improve generalization in high-dimensional regimes.
Theoretical analysis matches finite-sample experiments with hundreds of samples.
Look-alike clustering acts as a regularizer, enhancing model performance.
Abstract
While personalized recommendations systems have become increasingly popular, ensuring user data protection remains a top concern in the development of these learning systems. A common approach to enhancing privacy involves training models using anonymous data rather than individual data. In this paper, we explore a natural technique called \emph{look-alike clustering}, which involves replacing sensitive features of individuals with the cluster's average values. We provide a precise analysis of how training models using anonymous cluster centers affects their generalization capabilities. We focus on an asymptotic regime where the size of the training set grows in proportion to the features dimension. Our analysis is based on the Convex Gaussian Minimax Theorem (CGMT) and allows us to theoretically understand the role of different model components on the generalization error. In addition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRandom Matrices and Applications · Privacy-Preserving Technologies in Data · Survey Sampling and Estimation Techniques
MethodsFocus
