Identification of arabic word from bilingual text using character   features

Sofiene Haboubi; Samia Maddouri; Hamid Amiri

arXiv:1103.3430·cs.AI·April 5, 2012·1 cites

Identification of arabic word from bilingual text using character features

Sofiene Haboubi, Samia Maddouri, Hamid Amiri

PDF

Open Access

TL;DR

This paper explores using script features to identify Arabic language within bilingual Arabic/Latin texts, aiming to improve language recognition in multilingual documents.

Contribution

It introduces a method leveraging structural script features for Arabic language identification in bilingual texts, which is less reliant on global or statistical approaches.

Findings

01

Structural features can effectively distinguish Arabic from Latin scripts.

02

The method simplifies language identification in multilingual documents.

03

Results show promising accuracy in script-based language detection.

Abstract

The identification of the language of the script is an important stage in the process of recognition of the writing. There are several works in this research area, which treat various languages. Most of the used methods are global or statistical. In this present paper, we study the possibility of using the features of scripts to identify the language. The identification of the language of the script by characteristics returns the identification in the case of multilingual documents less difficult. We present by this work, a study on the possibility of using the structural features to identify the Arabic language from an Arabic / Latin text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Authorship Attribution and Profiling