TL;DR
This study evaluates AI, specifically GPT models, as a tool to assist in grading essay questions in higher education, focusing on accuracy, consistency, and bias reduction, and suggests AI can support human graders rather than replace them.
Contribution
It demonstrates GPT models' effectiveness in transcribing and scoring essays, highlighting their potential to improve grading fairness and efficiency in educational settings.
Findings
GPT models show high transcription accuracy matching human responses
GPT scores correlate strongly with human grades, especially with template answers
AI can serve as a second grader to flag inconsistencies for human review
Abstract
This study explores the use of artificial intelligence (AI) as a complementary tool for grading essay-type questions in higher education, focusing on its consistency with human grading and potential to reduce biases. Using 70 handwritten exams from an introductory sociology course, we evaluated generative pre-trained transformers (GPT) models' performance in transcribing and scoring students' responses. GPT models were tested under various settings for both transcription and grading tasks. Results show high similarity between human and GPT transcriptions, with GPT-4o-mini outperforming GPT-4o in accuracy. For grading, GPT demonstrated strong correlations with the human grader scores, especially when template answers were provided. However, discrepancies remained, highlighting GPT's role as a "second grader" to flag inconsistencies for assessment reviewing rather than fully replace human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Multi-Head Attention · Discriminative Fine-Tuning · Layer Normalization · Dense Connections · Cosine Annealing · Attention Dropout · Adam · Residual Connection
