Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy   Tasks

Maryam Fanaeepour; Adam Makarucha; Jey Han Lau

arXiv:1804.04211·cs.CL·April 13, 2018

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

PDF

Open Access

TL;DR

This paper empirically investigates how hyper-parameters like vector dimensions and corpus size influence the quality of word embeddings and their effectiveness in similarity and analogy tasks.

Contribution

It provides a systematic analysis of hyper-parameter effects on embedding quality using standard evaluation metrics and datasets.

Findings

01

Hyper-parameters significantly affect embedding quality.

02

Optimal vector dimensions depend on specific tasks.

03

Corpus size impacts the performance of embeddings.

Abstract

The versatility of word embeddings for various applications is attracting researchers from various fields. However, the impact of hyper-parameters when training embedding model is often poorly understood. How much do hyper-parameters such as vector dimensions and corpus size affect the quality of embeddings, and how do these results translate to downstream applications? Using standard embedding evaluation metrics and datasets, we conduct a study to empirically measure the impact of these hyper-parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques