Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

TL;DR
This paper introduces a sampling-based method to enhance automated scoring systems by intelligently combining human and machine scoring, significantly improving accuracy and reliability while maintaining cost-effectiveness.
Contribution
It proposes a reward sampling approach for selecting responses for human scoring, achieving substantial accuracy gains and providing statistical guarantees for performance estimation.
Findings
19.80% average accuracy improvement with sampling
25.60% average increase in quadratic weighted kappa
Effective across various models and pseudo models
Abstract
Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Testing and Debugging Techniques
