Diverse Linguistic Features for Assessing Reading Difficulty of   Educational Filipino Texts

Joseph Marvin Imperial; Ethel Ong

arXiv:2108.00241·cs.CL·August 3, 2021·5 cites

Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts

Joseph Marvin Imperial, Ethel Ong

PDF

Open Access

TL;DR

This paper develops machine learning models using diverse linguistic features to automatically assess the reading difficulty of Filipino educational texts, aiming to improve learning quality and material selection.

Contribution

It introduces a novel set of linguistic features for Filipino readability assessment and demonstrates the effectiveness of Random Forest models with these features.

Findings

01

Random Forest achieved 62.7% accuracy

02

Optimal feature combination improved accuracy to 66.1%

03

Diverse linguistic features enhance Filipino text difficulty prediction

Abstract

In order to ensure quality and effective learning, fluency, and comprehension, the proper identification of the difficulty levels of reading materials should be observed. In this paper, we describe the development of automatic machine learning-based readability assessment models for educational Filipino texts using the most diverse set of linguistic features for the language. Results show that using a Random Forest model obtained a high performance of 62.7% in terms of accuracy, and 66.1% when using the optimal combination of feature sets consisting of traditional and syllable pattern-based predictors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Reading and Literacy Development · Second Language Acquisition and Learning