word representation or word embedding in Persian text

Siamak Sarmady; Erfan Rahmani

arXiv:1712.06674·cs.CL·December 20, 2017·1 cites

word representation or word embedding in Persian text

Siamak Sarmady, Erfan Rahmani

PDF

Open Access

TL;DR

This paper explores methods for creating word embeddings in Persian text using adapted GloVe, CBOW, and skip-gram models trained on multiple corpora, providing valuable vectors for Persian NLP tasks.

Contribution

It updates and applies GloVe, CBOW, and skip-gram models specifically for Persian, producing a large set of word vectors for NLP applications.

Findings

01

Produced 342,362 Persian word vectors across three models

02

Demonstrated the applicability of these embeddings for Persian NLP tasks

03

Enhanced word representation methods for Persian language processing

Abstract

Text processing is one of the sub-branches of natural language processing. Recently, the use of machine learning and neural networks methods has been given greater consideration. For this reason, the representation of words has become very important. This article is about word representation or converting words into vectors in Persian text. In this research GloVe, CBOW and skip-gram methods are updated to produce embedded vectors for Persian words. In order to train a neural networks, Bijankhan corpus, Hamshahri corpus and UPEC corpus have been compound and used. Finally, we have 342,362 words that obtained vectors in all three models for this words. These vectors have many usage for Persian natural language processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsGloVe Embeddings