Intrinsic analysis for dual word embedding space models

Mohit Mayank

arXiv:2012.00728·cs.CL·December 8, 2020·1 cites

Intrinsic analysis for dual word embedding space models

Mohit Mayank

PDF

Open Access

TL;DR

This study systematically compares dual embedding space models, specifically Word2Vec and GloVe, across various hyper-parameter settings to determine which configurations perform best on semantic, association, and analogy tasks.

Contribution

It provides a comprehensive evaluation of multiple hyper-parameter variations for dual embedding models, highlighting the effectiveness of non-default configurations across tasks.

Findings

01

GloVe non-default models outperform in all tasks.

02

Word2Vec non-default models outperform in 2 out of 3 tasks.

03

Extensive comparison across 84 models and datasets.

Abstract

Recent word embeddings techniques represent words in a continuous vector space, moving away from the atomic and sparse representations of the past. Each such technique can further create multiple varieties of embeddings based on different settings of hyper-parameters like embedding dimension size, context window size and training method. One additional variety appears when we especially consider the Dual embedding space techniques which generate not one but two-word embeddings as output. This gives rise to an interesting question - "is there one or a combination of the two word embeddings variety, which works better for a specific task?". This paper tries to answer this question by considering all of these variations. Herein, we compare two classical embedding methods belonging to two different methodologies - Word2Vec from window-based and Glove from count-based. For an extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsGloVe Embeddings