TL;DR
This paper introduces JoSH, a joint spherical embedding method for hierarchical topic mining that effectively models category structures and discovers representative terms with minimal supervision, improving hierarchical text classification.
Contribution
The paper proposes a novel joint tree and text embedding approach, JoSH, for weakly-supervised hierarchical topic mining that aligns category structures with text representations.
Findings
JoSH efficiently mines high-quality hierarchical topics.
JoSH benefits weakly-supervised hierarchical text classification.
The model outperforms existing methods in quality and efficiency.
Abstract
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
