Document Embedding with Paragraph Vectors

Andrew M. Dai; Christopher Olah; Quoc V. Le

arXiv:1507.07998·cs.CL·July 30, 2015·267 cites

Document Embedding with Paragraph Vectors

Andrew M. Dai, Christopher Olah, Quoc V. Le

PDF

Open Access 5 Repos

TL;DR

This paper evaluates Paragraph Vectors for document embedding across various tasks, demonstrating superior performance over other models and exploring semantic operations, with improvements in embedding quality.

Contribution

It provides a comprehensive comparison of Paragraph Vectors to other document modeling algorithms and introduces a simple enhancement to improve embeddings.

Findings

01

Paragraph Vectors outperform other document embedding methods in similarity tasks.

02

Vector operations on Paragraph Vectors can produce meaningful semantic results.

03

The proposed improvement enhances the quality of the learned embeddings.

Abstract

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a more thorough comparison of Paragraph Vectors to other document modelling algorithms such as Latent Dirichlet Allocation, and evaluate performance of the method as we vary the dimensionality of the learned representation. We benchmarked the models on two document similarity data sets, one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method performs significantly better than other methods, and propose a simple improvement to enhance embedding quality. Somewhat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies