word representation or word embedding in Persian text
Siamak Sarmady, Erfan Rahmani

TL;DR
This paper explores methods for creating word embeddings in Persian text using adapted GloVe, CBOW, and skip-gram models trained on multiple corpora, providing valuable vectors for Persian NLP tasks.
Contribution
It updates and applies GloVe, CBOW, and skip-gram models specifically for Persian, producing a large set of word vectors for NLP applications.
Findings
Produced 342,362 Persian word vectors across three models
Demonstrated the applicability of these embeddings for Persian NLP tasks
Enhanced word representation methods for Persian language processing
Abstract
Text processing is one of the sub-branches of natural language processing. Recently, the use of machine learning and neural networks methods has been given greater consideration. For this reason, the representation of words has become very important. This article is about word representation or converting words into vectors in Persian text. In this research GloVe, CBOW and skip-gram methods are updated to produce embedded vectors for Persian words. In order to train a neural networks, Bijankhan corpus, Hamshahri corpus and UPEC corpus have been compound and used. Finally, we have 342,362 words that obtained vectors in all three models for this words. These vectors have many usage for Persian natural language processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsGloVe Embeddings
