Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes
Kar Wai Lim, Wray Buntine, Changyou Chen, Lan Du

TL;DR
This paper introduces a hierarchical nonparametric Bayesian framework using Pitman-Yor processes for topic modeling, demonstrating improved performance on social media text compared to traditional parametric models.
Contribution
It presents a novel hierarchical Pitman-Yor process-based topic model and applies it to social media data, showing enhanced modeling capabilities.
Findings
Outperforms parametric models in fit and application.
Effective for modeling social media text, especially tweets.
Provides a flexible framework for nonparametric Bayesian topic modeling.
Abstract
The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
