Anonymous Learning via Look-Alike Clustering: A Precise Analysis of   Model Generalization

Adel Javanmard; Vahab Mirrokni

arXiv:2310.04015·cs.LG·November 3, 2023

Anonymous Learning via Look-Alike Clustering: A Precise Analysis of Model Generalization

Adel Javanmard, Vahab Mirrokni

PDF

Open Access 1 Video

TL;DR

This paper analyzes how using look-alike clustering for anonymous data affects model generalization, revealing it can serve as a regularizer and improve performance in high-dimensional settings.

Contribution

It provides a theoretical framework using CGMT to understand the impact of anonymous cluster centers on model generalization, supported by finite-sample experiments.

Findings

01

Training on anonymous cluster centers can improve generalization in high-dimensional regimes.

02

Theoretical analysis matches finite-sample experiments with hundreds of samples.

03

Look-alike clustering acts as a regularizer, enhancing model performance.

Abstract

While personalized recommendations systems have become increasingly popular, ensuring user data protection remains a top concern in the development of these learning systems. A common approach to enhancing privacy involves training models using anonymous data rather than individual data. In this paper, we explore a natural technique called \emph{look-alike clustering}, which involves replacing sensitive features of individuals with the cluster's average values. We provide a precise analysis of how training models using anonymous cluster centers affects their generalization capabilities. We focus on an asymptotic regime where the size of the training set grows in proportion to the features dimension. Our analysis is based on the Convex Gaussian Minimax Theorem (CGMT) and allows us to theoretically understand the role of different model components on the generalization error. In addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Anonymous Learning via Look-Alike Clustering: A Precise Analysis of Model Generalization· slideslive

Taxonomy

TopicsRandom Matrices and Applications · Privacy-Preserving Technologies in Data · Survey Sampling and Estimation Techniques

MethodsFocus