Nonparametric Bayes Pachinko Allocation
Wei Li, David Blei, Andrew McCallum

TL;DR
This paper introduces a nonparametric Bayesian extension of the Pachinko Allocation Model (PAM) that automatically learns the number of topics and their correlations from data, enhancing flexibility over traditional models like LDA.
Contribution
It proposes a nonparametric Bayesian prior for PAM based on the hierarchical Dirichlet process, enabling automatic discovery of topic structure and correlations.
Findings
Nonparametric PAM matches the performance of manually tuned PAM.
The model effectively learns the number of topics from data.
It captures complex topic correlations in text datasets.
Abstract
Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides more flexibility and greater expressive power than previous models like latent Dirichlet allocation (LDA), it is also more difficult to determine the appropriate topic structure for a specific dataset. In this paper, we propose a nonparametric Bayesian prior for PAM based on a variant of the hierarchical Dirichlet process (HDP). Although the HDP can capture topic correlations defined by nested data structure, it does not automatically discover such correlations from unstructured data. By assuming an HDP-based prior for PAM, we are able to learn both the number of topics and how the topics are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Topic Modeling · Natural Language Processing Techniques
