# On Privacy Protection of Latent Dirichlet Allocation Model Training

**Authors:** Fangyuan Zhao, Xuebin Ren, Shusen Yang, Xinyu Yang

arXiv: 1906.01178 · 2019-07-02

## TL;DR

This paper investigates privacy risks in LDA model training and proposes privacy-preserving algorithms, including a privacy monitoring method and a locally private training algorithm, validated by experiments on real datasets.

## Contribution

It introduces novel privacy-preserving algorithms for LDA training, addressing both inherent randomness and local differential privacy in crowdsourced data.

## Key findings

- The inherent randomness of CGS provides some privacy guarantees.
- The locally private LDA algorithm achieves differential privacy for individual data contributors.
- Experimental results confirm the effectiveness of the proposed privacy-preserving methods.

## Abstract

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01178/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01178/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1906.01178/full.md

---
Source: https://tomesphere.com/paper/1906.01178