Personalized Language Model Learning on Text Data Without User   Identifiers

Yucheng Ding; Yangwenjian Tan; Xiangyu Liu; Chaoyue Niu; Fandong Meng,; Jie Zhou; Ning Liu; Fan Wu; Guihai Chen

arXiv:2501.06062·cs.LG·January 13, 2025

Personalized Language Model Learning on Text Data Without User Identifiers

Yucheng Ding, Yangwenjian Tan, Xiangyu Liu, Chaoyue Niu, Fandong Meng,, Jie Zhou, Ning Liu, Fan Wu, Guihai Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for personalized language models that maintains user privacy by using dynamic, user-specific distributions for embeddings, enabling personalization without user identifiers.

Contribution

It proposes a novel approach where each device maintains a user-specific distribution for embeddings, enhancing privacy while improving language model personalization.

Findings

01

Significant accuracy improvements with anonymous user embeddings

02

Effective privacy preservation through distribution-based embeddings

03

Maintains real-time inference performance

Abstract

In many practical natural language applications, user data are highly sensitive, requiring anonymous uploads of text data from mobile devices to the cloud without user identifiers. However, the absence of user identifiers restricts the ability of cloud-based language models to provide personalized services, which are essential for catering to diverse user needs. The trivial method of replacing an explicit user identifier with a static user embedding as model input still compromises data anonymization. In this work, we propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings, thereby breaking the one-to-one mapping between an embedding and a specific user. We further theoretically demonstrate that to prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjtu-yc/idfree-personalized-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies