BERT Embeddings for Automatic Readability Assessment
Joseph Marvin Imperial

TL;DR
This paper proposes a combined BERT embedding and handcrafted feature approach for automatic readability assessment, demonstrating improved performance in English and Filipino, and showing potential for low-resource languages.
Contribution
It introduces a novel method combining BERT embeddings with linguistic features, enhancing readability assessment especially for low-resource languages.
Findings
Outperforms classical methods with up to 12.4% F1 increase in English and Filipino.
BERT embeddings can substitute explicit feature extraction in low-resource languages.
Method shows effectiveness across languages with limited NLP tools.
Abstract
Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Dense Connections
