Word Embeddings for the Armenian Language: Intrinsic and Extrinsic   Evaluation

Karen Avetisyan; Tsolak Ghukasyan

arXiv:1906.03134·cs.CL·June 10, 2019·1 cites

Word Embeddings for the Armenian Language: Intrinsic and Extrinsic Evaluation

Karen Avetisyan, Tsolak Ghukasyan

PDF

Open Access 1 Repo

TL;DR

This paper evaluates existing and new Armenian word embeddings using intrinsic and extrinsic methods, including novel datasets for benchmarking, to improve NLP tasks for the language.

Contribution

It introduces new Armenian word embeddings trained with GloVe, fastText, CBOW, and SkipGram, and provides benchmark datasets for future research.

Findings

01

Intrinsic evaluation using word analogy tasks

02

Extrinsic evaluation on morphological tagging and text classification

03

Publicly available datasets for Armenian NLP benchmarking

Abstract

In this work, we intrinsically and extrinsically evaluate and compare existing word embedding models for the Armenian language. Alongside, new embeddings are presented, trained using GloVe, fastText, CBOW, SkipGram algorithms. We adapt and use the word analogy task in intrinsic evaluation of embeddings. For extrinsic evaluation, two tasks are employed: morphological tagging and text classification. Tagging is performed on a deep neural network, using ArmTDP v2.3 dataset. For text classification, we propose a corpus of news articles categorized into 7 classes. The datasets are made public to serve as benchmarks for future models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ispras-texterra/word-embeddings-eval-hy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining

MethodsfastText · GloVe Embeddings