Artificial Intelligence Bias on English Language Learners in Automatic Scoring

Shuchen Guo; Yun Wang; Jichao Yu; Xuansheng Wu; Bilgehan Ayik; Field M. Watts; Ehsan Latif; Ninghao Liu; Lei Liu; Xiaoming Zhai

arXiv:2505.10643·cs.CL·May 21, 2025

Artificial Intelligence Bias on English Language Learners in Automatic Scoring

Shuchen Guo, Yun Wang, Jichao Yu, Xuansheng Wu, Bilgehan Ayik, Field M. Watts, Ehsan Latif, Ninghao Liu, Lei Liu, Xiaoming Zhai

PDF

Open Access

TL;DR

This study examines how unbalanced training data affects bias in automatic scoring systems for ELL students, finding that larger datasets reduce bias but small samples may lead to disparities.

Contribution

It demonstrates the impact of dataset balance and size on bias in AI scoring of ELL responses, highlighting the importance of sufficient data.

Findings

01

No bias with large datasets (30,000 and 1,000 responses)

02

Bias may occur with small datasets (200 responses)

03

Balanced datasets reduce scoring disparities

Abstract

This study investigated potential scoring biases and disparities toward English Language Learners (ELLs) when using automatic scoring systems for middle school students' written responses to science assessments. We specifically focus on examining how unbalanced training data with ELLs contributes to scoring bias and disparities. We fine-tuned BERT with four datasets: responses from (1) ELLs, (2) non-ELLs, (3) a mixed dataset reflecting the real-world proportion of ELLs and non-ELLs (unbalanced), and (4) a balanced mixed dataset with equal representation of both groups. The study analyzed 21 assessment items: 10 items with about 30,000 ELL responses, five items with about 1,000 ELL responses, and six items with about 200 ELL responses. Scoring accuracy (Acc) was calculated and compared to identify bias using Friedman tests. We measured the Mean Score Gaps (MSGs) between ELLs and non-ELLs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychometric Methodologies and Testing · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Attention Dropout · WordPiece · Residual Connection · Linear Layer · Weight Decay