Hierarchical Latent Word Clustering
Halid Ziya Yerebakan, Fitsum Reda, Yiqiang Zhan, Yoshihisa Shinagawa

TL;DR
This paper introduces a Bayesian non-parametric model that extends Hierarchical Dirichlet Allocation to discover tree-structured word clusters, revealing meaningful hierarchical relationships in text data.
Contribution
It proposes a novel hierarchical clustering method for words using an extended Hierarchical Dirichlet Allocation model, enabling the extraction of structured word clusters.
Findings
Discovered meaningful hierarchical word structures in NIPS corpus.
Identified hierarchical clusters in radiology reports.
Demonstrated the model's effectiveness in capturing word relationships.
Abstract
This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Bayesian Methods and Mixture Models · Natural Language Processing Techniques
