Identification of arabic word from bilingual text using character features
Sofiene Haboubi, Samia Maddouri, Hamid Amiri

TL;DR
This paper explores using script features to identify Arabic language within bilingual Arabic/Latin texts, aiming to improve language recognition in multilingual documents.
Contribution
It introduces a method leveraging structural script features for Arabic language identification in bilingual texts, which is less reliant on global or statistical approaches.
Findings
Structural features can effectively distinguish Arabic from Latin scripts.
The method simplifies language identification in multilingual documents.
Results show promising accuracy in script-based language detection.
Abstract
The identification of the language of the script is an important stage in the process of recognition of the writing. There are several works in this research area, which treat various languages. Most of the used methods are global or statistical. In this present paper, we study the possibility of using the features of scripts to identify the language. The identification of the language of the script by characteristics returns the identification in the case of multilingual documents less difficult. We present by this work, a study on the possibility of using the structural features to identify the Arabic language from an Arabic / Latin text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Authorship Attribution and Profiling
