Neural Named Entity Recognition for Kazakh
Gulmira Tolegen, Alymzhan Toleu, Orken Mamyrbayev, Rustam, Mussabayev

TL;DR
This paper introduces neural network models with root and entity tag embeddings and tensor layers to improve named entity recognition in morphologically complex languages like Kazakh, addressing data sparsity issues.
Contribution
It presents novel neural network architectures that incorporate root and tag embeddings along with tensor layers, outperforming existing methods for MCL NER tasks.
Findings
Models outperform state-of-the-art approaches
Incorporating root and tag embeddings improves accuracy
Tensor layers enhance model performance
Abstract
We present several neural networks to address the task of named entity recognition for morphologically complex languages (MCL). Kazakh is a morphologically complex language in which each root/stem can produce hundreds or thousands of variant word forms. This nature of the language could lead to a serious data sparsity problem, which may prevent the deep learning models from being well trained for under-resourced MCLs. In order to model the MCLs' words effectively, we introduce root and entity tag embedding plus tensor layer to the neural networks. The effects of those are significant for improving NER model performance of MCLs. The proposed models outperform state-of-the-art including character-based approaches, and can be potentially applied to other morphologically complex languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
