The Author-Topic Model for Authors and Documents
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

TL;DR
The paper introduces the author-topic model, a generative approach extending LDA to incorporate authorship, enabling analysis of author-specific topic distributions and improving understanding of document authorship and content.
Contribution
It presents the author-topic model that integrates authorship into topic modeling, providing a novel way to analyze author-specific topics and document content.
Findings
The model effectively captures author-specific topic distributions.
It outperforms traditional LDA and simple author models in certain tasks.
Applications include author similarity and output entropy analysis.
Abstract
We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Computational and Text Analysis Methods
