Topic2Vec: Learning Distributed Representations of Topics

Li-Qiang Niu; Xin-Yu Dai

arXiv:1506.08422·cs.CL·June 30, 2015·29 cites

Topic2Vec: Learning Distributed Representations of Topics

Li-Qiang Niu, Xin-Yu Dai

PDF

Open Access

TL;DR

This paper introduces Topic2Vec, a novel embedding method that learns topic representations in the same semantic space as words, offering a more effective alternative to traditional LDA-based features for text analysis.

Contribution

The paper proposes Topic2Vec, a new approach to learn topic embeddings in the same space as words, improving upon LDA for capturing semantic relationships.

Findings

01

Topic2Vec produces meaningful topic representations.

02

It outperforms LDA in capturing semantic relationships.

03

Experimental results validate the effectiveness of Topic2Vec.

Abstract

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical relationship of occurrences in the corpus and usually in practice, probability is not the best choice for feature representations. Recently, embedding methods have been proposed to represent words and documents by learning essential concepts and representations, such as Word2Vec and Doc2Vec. The embedded representations have shown more effectiveness than LDA-style representations in many tasks. In this paper, we propose the Topic2Vec approach which can learn topic representations in the same semantic vector space with words, as an alternative to probability. The experimental results show that Topic2Vec achieves interesting and meaningful results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques

MethodsLinear Discriminant Analysis