A Comparative Analysis of Static Word Embeddings for Hungarian
M\'at\'e Gedeon

TL;DR
This study compares various static word embeddings for Hungarian, evaluating traditional models and BERT-based methods on intrinsic and extrinsic tasks, revealing the strengths of FastText and the X2Static extraction approach.
Contribution
It introduces a comprehensive evaluation of static embeddings for Hungarian, including novel extraction methods from BERT-based models, and provides insights into their relative performance.
Findings
FastText excels in intrinsic analogy tasks.
X2Static extraction improves BERT-based static embeddings.
ELMo embeddings perform best in NER and POS tagging.
Abstract
This paper presents a comprehensive analysis of various static word embeddings for Hungarian, including traditional models such as Word2Vec, FastText, as well as static embeddings derived from BERT-based models using different extraction methods. We evaluate these embeddings on both intrinsic and extrinsic tasks to provide a holistic view of their performance. For intrinsic evaluation, we employ a word analogy task, which assesses the embeddings ability to capture semantic and syntactic relationships. Our results indicate that traditional static embeddings, particularly FastText, excel in this task, achieving high accuracy and mean reciprocal rank (MRR) scores. Among the BERT-based models, the X2Static method for extracting static embeddings demonstrates superior performance compared to decontextualized and aggregate methods, approaching the effectiveness of traditional static…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Tanh Activation · Bidirectional LSTM · ELMo · Sigmoid Activation · Long Short-Term Memory
