EMBRACE: Evaluation and Modifications for Boosting RACE
Mariia Zyrianova, Dmytro Kalpakchi, Johan Boye

TL;DR
This paper critically evaluates the RACE dataset for machine reading comprehension, analyzing question difficulty, justification bases, and biases, and identifies a high-quality subset to improve evaluation standards.
Contribution
It provides a detailed analysis of RACE's quality, identifies issues with question validity and bias, and proposes a high-quality subset for more reliable model evaluation.
Findings
Many MCQs do not meet basic comprehension requirements
Bases for answer justification are biased towards specific text parts
A high-quality subset of RACE is identified for better evaluation
Abstract
When training and evaluating machine reading comprehension models, it is very important to work with high-quality datasets that are also representative of real-world reading comprehension tasks. This requirement includes, for instance, having questions that are based on texts of different genres and require generating inferences or reflecting on the reading material. In this article we turn our attention to RACE, a dataset of English texts and corresponding multiple-choice questions (MCQs). Each MCQ consists of a question and four alternatives (of which one is the correct answer). RACE was constructed by Chinese teachers of English for human reading comprehension and is widely used as training material for machine reading comprehension models. By construction, RACE should satisfy the aforementioned quality requirements and the purpose of this article is to check whether they are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsTest
