Essay-BR: a Brazilian Corpus of Essays
Jeziel C. Marinho, Rafael T. Anchieta, and Raimundo S. Moura

TL;DR
This paper introduces a large, publicly available Brazilian Portuguese essay corpus with expert scores, aiming to support automatic essay scoring research and address the language gap in AES studies.
Contribution
It presents the first extensive Brazilian Portuguese essay corpus with expert annotations, facilitating future AES research in this language.
Findings
Identified challenges in applying AES to Portuguese essays
Provided a benchmark dataset for future AES models in Portuguese
Demonstrated the corpus's potential for improving AES systems
Abstract
Automatic Essay Scoring (AES) is defined as the computer technology that evaluates and scores the written essays, aiming to provide computational models to grade essays either automatically or with minimal human involvement. While there are several AES studies in a variety of languages, few of them are focused on the Portuguese language. The main reason is the lack of a corpus with manually graded essays. In order to bridge this gap, we create a large corpus with several essays written by Brazilian high school students on an online platform. All of the essays are argumentative and were scored across five competencies by experts. Moreover, we conducted an experiment on the created corpus and showed challenges posed by the Portuguese language. Our corpus is publicly available at https://github.com/rafaelanchieta/essay.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
