Using Sampling to Estimate and Improve Performance of Automated Scoring   Systems with Guarantees

Yaman Kumar Singla; Sriram Krishna; Rajiv Ratn Shah; Changyou Chen

arXiv:2111.08906·cs.CL·November 18, 2021

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sampling-based method to enhance automated scoring systems by intelligently combining human and machine scoring, significantly improving accuracy and reliability while maintaining cost-effectiveness.

Contribution

It proposes a reward sampling approach for selecting responses for human scoring, achieving substantial accuracy gains and providing statistical guarantees for performance estimation.

Findings

01

19.80% average accuracy improvement with sampling

02

25.60% average increase in quadratic weighted kappa

03

Effective across various models and pseudo models

Abstract

Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

midas-research/Improvement-and-Estimation-of-Automated-Scoring-Systems-Performance-with-Guarantees
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Testing and Debugging Techniques