"Look Ma, No Hands!" A Parameter-Free Topic Model
Jian Tang, Ming Zhang, Qiaozhu Mei

TL;DR
This paper introduces a novel parameter-free topic modeling approach that automatically determines the optimal number of topics by maximizing topic diversity or matching an exemplar, simplifying the process and improving quality.
Contribution
It presents a nonparametric PLSA variant that eliminates the need for predefined topic numbers using diversity and weak supervision, outperforming traditional Bayesian nonparametric models.
Findings
Achieves comparable or better topic quality than classical models.
Automatically determines the number of topics without manual tuning.
Works effectively on both synthetic and real datasets.
Abstract
It has always been a burden to the users of statistical topic models to predetermine the right number of topics, which is a key parameter of most topic models. Conventionally, automatic selection of this parameter is done through either statistical model selection (e.g., cross-validation, AIC, or BIC) or Bayesian nonparametric models (e.g., hierarchical Dirichlet process). These methods either rely on repeated runs of the inference algorithm to search through a large range of parameter values which does not suit the mining of big data, or replace this parameter with alternative parameters that are less intuitive and still hard to be determined. In this paper, we explore to "eliminate" this parameter from a new perspective. We first present a nonparametric treatment of the PLSA model named nonparametric probabilistic latent semantic analysis (nPLSA). The inference procedure of nPLSA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Complex Network Analysis Techniques
