Nested Hierarchical Dirichlet Processes
John Paisley, Chong Wang, David M. Blei, Michael I. Jordan

TL;DR
The paper introduces the nested hierarchical Dirichlet process (nHDP), a flexible hierarchical topic model that allows documents to express multiple themes by following multiple paths in a shared topic tree, with efficient inference on large datasets.
Contribution
It generalizes the nested Chinese restaurant process to enable multi-path topic assignments per document, improving thematic flexibility and scalability.
Findings
Effective on large datasets like NYT and Wikipedia
Allows multiple thematic paths per document
Provides efficient stochastic variational inference
Abstract
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
