Evaluation of Greek Word Embeddings
Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis, Michalis, Vazirgiannis

TL;DR
This paper develops and evaluates Greek word embeddings, creating new linguistic resources and analyzing how Greek's morphological complexity and polysemy affect embedding quality.
Contribution
It introduces Greek-specific analogy and similarity corpora and assesses multiple models, highlighting linguistic factors impacting embedding performance.
Findings
Greek embeddings are meaningful and comparable to English.
Morphological complexity affects embedding quality.
Polysemy influences the effectiveness of word representations.
Abstract
Since word embeddings have been the most popular input for many NLP tasks, evaluating their quality is of critical importance. Most research efforts are focusing on English word embeddings. This paper addresses the problem of constructing and evaluating such models for the Greek language. We created a new word analogy corpus considering the original English Word2vec word analogy corpus and some specific linguistic aspects of the Greek language as well. Moreover, we created a Greek version of WordSim353 corpora for a basic evaluation of word similarities. We tested seven word vector models and our evaluation showed that we are able to create meaningful representations. Last, we discovered that the morphological complexity of the Greek language and polysemy can influence the quality of the resulting word embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
