Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos, Treviso, Jessica Rodrigues, Sandra Aluisio

TL;DR
This paper evaluates various Portuguese word embedding models on intrinsic analogy tasks and extrinsic NLP tasks, finding that analogy tests are less effective than task-specific evaluations for assessing embedding quality.
Contribution
It provides a comprehensive comparison of 31 Portuguese word embeddings trained with different models and highlights the limitations of analogy-based evaluation.
Findings
Word analogies are not reliable for evaluation.
Task-specific evaluations yield better insights into embedding quality.
Different models perform variably across tasks.
Abstract
Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants. We trained 31 word embedding models using FastText, GloVe, Wang2Vec and Word2Vec. We evaluated them intrinsically on syntactic and semantic analogies and extrinsically on POS tagging and sentence semantic similarity tasks. The obtained results suggest that word analogies are not appropriate for word embedding evaluation; task-specific evaluations appear to be a better option.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nilc-nlp/fasttext-skip-gram-600dmodel
- 🤗nilc-nlp/word2vec-skip-gram-50dmodel
- 🤗nilc-nlp/wang2vec-cbow-300dmodel
- 🤗nilc-nlp/word2vec-cbow-100dmodel
- 🤗nilc-nlp/word2vec-skip-gram-100dmodel
- 🤗nilc-nlp/wang2vec-skip-gram-50dmodel
- 🤗nilc-nlp/fasttext-cbow-1000dmodel
- 🤗nilc-nlp/word2vec-skip-gram-300dmodel
- 🤗nilc-nlp/wang2vec-skip-gram-100dmodel
- 🤗nilc-nlp/fasttext-cbow-300dmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
