A Machine Learning Approach to Persian Text Readability Assessment Using a Crowdsourced Dataset
Hamid Mohammadi, Seyed Hossein Khasteh

TL;DR
This paper introduces the first Persian text readability assessment dataset and a machine learning model, demonstrating high accuracy and potential applications in education and healthcare.
Contribution
It presents the first Persian dataset and machine learning model for text readability assessment, filling a significant research gap.
Findings
Model achieved high accuracy in Persian text readability assessment
First dataset for Persian text readability created
Potential applications in medical and educational texts
Abstract
An automated approach to text readability assessment is essential to a language and can be a powerful tool for improving the understandability of texts written and published in that language. However, the Persian language, which is spoken by over 110 million speakers, lacks such a system. Unlike other languages such as English, French, and Chinese, very limited research studies have been carried out to build an accurate and reliable text readability assessment system for the Persian language. In the present research, the first Persian dataset for text readability assessment was gathered and the first model for Persian text readability assessment using machine learning was introduced. The experiments showed that this model was accurate and could assess the readability of Persian texts with a high degree of confidence. The results of this study can be used in a number of applications such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
