Latent Dirichlet Allocation Model Training with Differential Privacy

Fangyuan Zhao; Xuebin Ren; Shusen Yang; Qing Han; Peng Zhao; and Xinyu; Yang

arXiv:2010.04391·cs.LG·October 12, 2020·6 cites

Latent Dirichlet Allocation Model Training with Differential Privacy

Fangyuan Zhao, Xuebin Ren, Shusen Yang, Qing Han, Peng Zhao, and Xinyu, Yang

PDF

Open Access

TL;DR

This paper introduces new differentially private algorithms for training Latent Dirichlet Allocation (LDA), ensuring privacy protection during text data analysis while maintaining effectiveness and efficiency.

Contribution

It provides the first theoretical analysis of differential privacy guarantees for CGS-based LDA training and proposes centralized, local, and online privacy-preserving LDA algorithms.

Findings

01

Theoretical privacy guarantees for CGS-based LDA training.

02

Effective privacy-preserving LDA algorithms validated by experiments.

03

Efficient algorithms suitable for streaming and crowdsourced data.

Abstract

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Mobile Crowdsensing and Crowdsourcing

MethodsLinear Discriminant Analysis