Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring   Systems

Anubha Kabra; Mehar Bhatia; Yaman Kumar; Junyi Jessy Li; Rajiv Ratn; Shah

arXiv:2007.06796·cs.CL·November 16, 2021

Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

Anubha Kabra, Mehar Bhatia, Yaman Kumar, Junyi Jessy Li, Rajiv Ratn, Shah

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive adversarial evaluation toolkit for automatic essay scoring systems, revealing their over-stability and highlighting the need for more holistic assessment methods.

Contribution

It proposes a model-agnostic adversarial testing scheme and metrics for AES systems, addressing the lack of holistic evaluation across multiple essay features.

Findings

01

AES models are highly overstable to content modifications.

02

Irrelevant content can increase automated scores.

03

Human raters struggle to detect adversarial content.

Abstract

Automatic scoring engines have been used for scoring approximately fifteen million test-takers in just the last three years. This number is increasing further due to COVID-19 and the associated automation of education and testing. Despite such wide usage, the AI-based testing literature of these "intelligent" models is highly lacking. Most of the papers proposing new models rely only on quadratic weighted kappa (QWK) based agreement with human raters for showing model efficacy. However, this effectively ignores the highly multi-feature nature of essay scoring. Essay scoring depends on features like coherence, grammar, relevance, sufficiency and, vocabulary. To date, there has been no study testing Automated Essay Scoring: AES systems holistically on all these features. With this motivation, we propose a model agnostic adversarial evaluation scheme and associated metrics for AES systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

midas-research/calling-out-bluff
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning