Next word prediction based on the N-gram model for Kurdish Sorani and   Kurmanji

Hozan K. Hamarashid; Soran A. Saeed; Tarik A. Rashid

arXiv:2008.01546·cs.CL·August 5, 2020

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Hozan K. Hamarashid, Soran A. Saeed, Tarik A. Rashid

PDF

TL;DR

This paper develops a next word prediction system for Kurdish Sorani and Kurmanji using an N-gram model, overcoming language resource challenges and achieving 96.3% accuracy with R programming.

Contribution

It introduces the first Kurdish corpus and implements an N-gram based next word prediction model for Sorani and Kurmanji languages.

Findings

01

Achieved 96.3% prediction accuracy

02

Created a Kurdish text corpus for NLP tasks

03

Developed a functional prediction application in R

Abstract

Next word prediction is an input technology that simplifies the process of typing by suggesting the next word to a user to select, as typing in a conversation consumes time. A few previous studies have focused on the Kurdish language, including the use of next word prediction. However, the lack of a Kurdish text corpus presents a challenge. Moreover, the lack of a sufficient number of N-grams for the Kurdish language, for instance, five grams, is the reason for the rare use of next Kurdish word prediction. Furthermore, the improper display of several Kurdish letters in the Rstudio software is another problem. This paper provides a Kurdish corpus, creates five, and presents a unique research work on next word prediction for Kurdish Sorani and Kurmanji. The N-gram model has been used for next word prediction to reduce the amount of time while typing in the Kurdish language. In addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.