SAT Based Analogy Evaluation Framework for Persian Word Embeddings
Seyyed Ehsan Mahmoudi, Mehrnoush Shamsfard

TL;DR
This paper introduces a SAT-based analogy evaluation framework tailored for Persian word embeddings, including a new dataset and benchmarks to assess semantic quality in a low-resource language context.
Contribution
It presents the first Persian-specific analogy evaluation framework with a handcrafted dataset and benchmarks, addressing the lack of semantic evaluation tools for Persian embeddings.
Findings
The framework effectively evaluates Persian word embeddings.
The dataset provides a new resource for Persian NLP research.
Benchmark results highlight the impact of parameters on semantic evaluation.
Abstract
In recent years there has been a special interest in word embeddings as a new approach to convert words to vectors. It has been a focal point to understand how much of the semantics of the the words has been transferred into embedding vectors. This is important as the embedding is going to be used as the basis for downstream NLP applications and it will be costly to evaluate the application end-to-end in order to identify quality of the used embedding model. Generally the word embeddings are evaluated through a number of tests, including analogy test. In this paper we propose a test framework for Persian embedding models. Persian is a low resource language and there is no rich semantic benchmark to evaluate word embedding models for this language. In this paper we introduce an evaluation framework including a hand crafted Persian SAT based analogy dataset, a colliquial test set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
