Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
David Mimno, Andrew McCallum

TL;DR
This paper introduces a Dirichlet-multinomial regression (DMR) topic model that incorporates document features into the prior, improving modeling of text data with metadata and outperforming some existing models.
Contribution
The paper presents a novel DMR topic model that integrates arbitrary document features into the prior, enhancing flexibility and performance over traditional models.
Findings
DMR models match or surpass existing models in performance.
Incorporating features like author and venue improves topic modeling.
The approach is versatile for various document metadata.
Abstract
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Bayesian Methods and Mixture Models
