TL;DR
This paper presents a fast, non-parametric approach to topic modeling with covariates, using convex matrix factorization and regression, offering better interpretability and inference than traditional generative models.
Contribution
It introduces a novel non-parametric, convex matrix factorization method combined with regression for covariate-aware topic modeling, bypassing complex generative models.
Findings
Efficient estimation of covariate effects on discourse topics.
Improved interpretability over traditional generative models.
Application to Canadian beer flavor discourse analysis.
Abstract
We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model, assuming instead that the data come from a fixed but unknown distribution whose statistical functionals are of interest. We propose combining a convex formulation of non-negative matrix factorization with standard regression techniques as a fast-to-compute and useful estimate of such a functional. Uncertainty quantification can then be achieved by reposing non-parametric resampling methods on top of this scheme. This is in contrast to popular topic modelling paradigms, which posit a complex and often hard-to-fit generative model of the data. We argue that the simple, non-parametric approach advocated here is faster, more interpretable, and enjoys…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
