Document Informed Neural Autoregressive Topic Models with Distributional   Prior

Pankaj Gupta; Yatin Chaudhary; Florian Buettner; Hinrich; Sch\"utze

arXiv:1809.06709·cs.CL·January 16, 2019

Document Informed Neural Autoregressive Topic Models with Distributional Prior

Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich, Sch\"utze

PDF

1 Repo

TL;DR

This paper introduces neural autoregressive topic models that leverage full context and external knowledge via embeddings, significantly improving performance on both long and short texts in terms of coherence, generalization, and applicability.

Contribution

It extends neural autoregressive topic models to incorporate full context and external embeddings, enhancing their effectiveness on diverse text datasets.

Findings

01

Models outperform state-of-the-art in generalization and interpretability.

02

Effective on both long and short texts across multiple domains.

03

Improved retrieval and classification performance.

Abstract

We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts "artificial neural networks" vs. "biological neuron networks". Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pgcool/iDocNADEe
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.