Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system
Rayed Ghazawi, Edwin Simpson

TL;DR
This paper introduces AR-AES, a new Arabic essay dataset, and evaluates a BERT-based system for automated scoring, showing promising accuracy and highlighting the potential to assist human graders despite inherent subjectivity.
Contribution
The study provides the first Arabic AES dataset with detailed annotations and evaluates AraBERT's performance, demonstrating its effectiveness across diverse question types.
Findings
96.15% of errors are within one point of human scores
79.49% of predictions exactly match the first human marker
BERT-based AES can assist in consistent grading across large classes
Abstract
Automated Essay Scoring (AES) holds significant promise in the field of education, helping educators to mark larger volumes of essays and provide timely feedback. However, Arabic AES research has been limited by the lack of publicly available essay data. This study introduces AR-AES, an Arabic AES benchmark dataset comprising 2046 undergraduate essays, including gender information, scores, and transparent rubric-based evaluation guidelines, providing comprehensive insights into the scoring process. These essays come from four diverse courses, covering both traditional and online exams. Additionally, we pioneer the use of AraBERT for AES, exploring its performance on different question types. We find encouraging results, particularly for Environmental Chemistry and source-dependent essay questions. For the first time, we examine the scale of errors made by a BERT-based AES system,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
