A Baseline Readability Model for Cebuano
Lloyd Lois Antonie Reyes, Michael Antonio Iba\~nez, Ranz Sapinit,, Mohammed Hussien, Joseph Marvin Imperial

TL;DR
This paper introduces the first baseline readability model for Cebuano, utilizing surface features, syllable patterns, and neural embeddings, achieving around 87% accuracy with a Random Forest classifier, and promotes further research in Philippine languages.
Contribution
It presents the initial readability assessment model for Cebuano, combining traditional features and neural embeddings, and demonstrates crosslingual applicability with open-sourced resources.
Findings
Achieved approximately 87% performance with handcrafted features and Random Forest.
Surface features and syllable patterns are effective for Cebuano readability.
Open-sourced code and data to foster further research.
Abstract
In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano's documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language showing potential of crosslingual application. To encourage more work for readability assessment in Philippine languages such as Cebuano, we open-sourced both code and data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Weight Decay · Dense Connections · Attention Dropout · Multi-Head Attention · Linear Warmup With Linear Decay · Adam
