Ordering-sensitive and Semantic-aware Topic Modeling
Min Yang, Tianyi Cui, Wenting Tu

TL;DR
This paper introduces GMNTM, a novel topic modeling approach that incorporates word order and semantics, outperforming existing models in perplexity, retrieval, and classification tasks.
Contribution
The paper proposes a Gaussian Mixture Neural Topic Model that jointly learns topics, word embeddings, and context, capturing ordering and semantic information for improved topic quality.
Findings
GMNTM achieves lower perplexity than state-of-the-art models.
The model improves retrieval and classification accuracy.
Experiments demonstrate better topic coherence and word distribution quality.
Abstract
Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Computational and Text Analysis Methods
