Contextual Analysis for Middle Eastern Languages with Hidden Markov   Models

Kazem Taghva

arXiv:1505.01757·cs.CL·September 15, 2015

Contextual Analysis for Middle Eastern Languages with Hidden Markov Models

Kazem Taghva

PDF

Open Access

TL;DR

This paper introduces a machine learning approach using Hidden Markov Models for contextual analysis of Middle Eastern languages, demonstrated with Farsi, achieving high accuracy and adaptable to other similar languages.

Contribution

The paper presents a novel application of first-order Hidden Markov Models for language-specific contextual analysis, reducing the need for complex rule coding across multiple languages.

Findings

01

Farsi model achieves 94% accuracy.

02

Approach can be extended to Arabic, Urdu, Sindhi.

03

Software can perform language analysis without complex rules.

Abstract

Displaying a document in Middle Eastern languages requires contextual analysis due to different presentational forms for each character of the alphabet. The words of the document will be formed by the joining of the correct positional glyphs representing corresponding presentational forms of the characters. A set of rules defines the joining of the glyphs. As usual, these rules vary from language to language and are subject to interpretation by the software developers. In this paper, we propose a machine learning approach for contextual analysis based on the first order Hidden Markov Model. We will design and build a model for the Farsi language to exhibit this technology. The Farsi model achieves 94 \% accuracy with the training based on a short list of 89 Farsi vocabularies consisting of 2780 Farsi characters. The experiment can be easily extended to many languages including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies