Syntactic Topic Models
Jordan Boyd-Graber, David M. Blei

TL;DR
The paper introduces the Syntactic Topic Model (STM), a Bayesian nonparametric approach that jointly captures semantic and syntactic coherence in language, improving language modeling over previous models.
Contribution
It presents the STM, a novel model combining syntactic dependency structures with topic modeling, along with a fast variational inference algorithm.
Findings
STM outperforms syntax-only and topic-only models in predictive tasks.
Qualitative analysis shows STM captures meaningful syntactic and semantic patterns.
Quantitative results demonstrate improved language modeling accuracy.
Abstract
The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Methods and Mixture Models
