Neural Named Entity Recognition for Kazakh

Gulmira Tolegen; Alymzhan Toleu; Orken Mamyrbayev; Rustam; Mussabayev

arXiv:2007.13626·cs.IR·October 5, 2021

Neural Named Entity Recognition for Kazakh

Gulmira Tolegen, Alymzhan Toleu, Orken Mamyrbayev, Rustam, Mussabayev

PDF

Open Access

TL;DR

This paper introduces neural network models with root and entity tag embeddings and tensor layers to improve named entity recognition in morphologically complex languages like Kazakh, addressing data sparsity issues.

Contribution

It presents novel neural network architectures that incorporate root and tag embeddings along with tensor layers, outperforming existing methods for MCL NER tasks.

Findings

01

Models outperform state-of-the-art approaches

02

Incorporating root and tag embeddings improves accuracy

03

Tensor layers enhance model performance

Abstract

We present several neural networks to address the task of named entity recognition for morphologically complex languages (MCL). Kazakh is a morphologically complex language in which each root/stem can produce hundreds or thousands of variant word forms. This nature of the language could lead to a serious data sparsity problem, which may prevent the deep learning models from being well trained for under-resourced MCLs. In order to model the MCLs' words effectively, we introduce root and entity tag embedding plus tensor layer to the neural networks. The effects of those are significant for improving NER model performance of MCLs. The proposed models outperform state-of-the-art including character-based approaches, and can be potentially applied to other morphologically complex languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification