Latent Dirichlet Allocation Model Training with Differential Privacy
Fangyuan Zhao, Xuebin Ren, Shusen Yang, Qing Han, Peng Zhao, and Xinyu, Yang

TL;DR
This paper introduces new differentially private algorithms for training Latent Dirichlet Allocation (LDA), ensuring privacy protection during text data analysis while maintaining effectiveness and efficiency.
Contribution
It provides the first theoretical analysis of differential privacy guarantees for CGS-based LDA training and proposes centralized, local, and online privacy-preserving LDA algorithms.
Findings
Theoretical privacy guarantees for CGS-based LDA training.
Effective privacy-preserving LDA algorithms validated by experiments.
Efficient algorithms suitable for streaming and crowdsourced data.
Abstract
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Mobile Crowdsensing and Crowdsourcing
MethodsLinear Discriminant Analysis
