Testing Hypotheses of Covariate Effects on Topics of Discourse

Gabriel Phelan; David A. Campbell

arXiv:2506.05570·stat.ME·November 5, 2025·Stat. Anal. Data Min.

Testing Hypotheses of Covariate Effects on Topics of Discourse

Gabriel Phelan, David A. Campbell

PDF

1 Repo

TL;DR

This paper presents a fast, non-parametric approach to topic modeling with covariates, using convex matrix factorization and regression, offering better interpretability and inference than traditional generative models.

Contribution

It introduces a novel non-parametric, convex matrix factorization method combined with regression for covariate-aware topic modeling, bypassing complex generative models.

Findings

01

Efficient estimation of covariate effects on discourse topics.

02

Improved interpretability over traditional generative models.

03

Application to Canadian beer flavor discourse analysis.

Abstract

We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model, assuming instead that the data come from a fixed but unknown distribution whose statistical functionals are of interest. We propose combining a convex formulation of non-negative matrix factorization with standard regression techniques as a fast-to-compute and useful estimate of such a functional. Uncertainty quantification can then be achieved by reposing non-parametric resampling methods on top of this scheme. This is in contrast to popular topic modelling paradigms, which posit a complex and often hard-to-fit generative model of the data. We argue that the simple, non-parametric approach advocated here is faster, more interpretable, and enjoys…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iamdavecampbell/NMFregress
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.