Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks
Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

TL;DR
This paper empirically investigates how hyper-parameters like vector dimensions and corpus size influence the quality of word embeddings and their effectiveness in similarity and analogy tasks.
Contribution
It provides a systematic analysis of hyper-parameter effects on embedding quality using standard evaluation metrics and datasets.
Findings
Hyper-parameters significantly affect embedding quality.
Optimal vector dimensions depend on specific tasks.
Corpus size impacts the performance of embeddings.
Abstract
The versatility of word embeddings for various applications is attracting researchers from various fields. However, the impact of hyper-parameters when training embedding model is often poorly understood. How much do hyper-parameters such as vector dimensions and corpus size affect the quality of embeddings, and how do these results translate to downstream applications? Using standard embedding evaluation metrics and datasets, we conduct a study to empirically measure the impact of these hyper-parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
