The Impact of Example Selection in Few-Shot Prompting on Automated Essay   Scoring Using GPT Models

Lui Yoshida

arXiv:2411.18924·cs.CL·December 2, 2024

The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models

Lui Yoshida

PDF

TL;DR

This paper examines how the choice and order of examples in few-shot prompting affect GPT-based automated essay scoring, revealing biases and model performance variations.

Contribution

It provides a detailed analysis of example selection effects on GPT models' essay scoring accuracy and bias, highlighting differences between GPT-3.5 and GPT-4.

Findings

01

Example selection significantly impacts GPT-3.5 scores

02

GPT-3.5 shows stronger biases than GPT-4

03

Careful example selection can improve GPT-3.5 performance

Abstract

This study investigates the impact of example selection on the performance of au-tomated essay scoring (AES) using few-shot prompting with GPT models. We evaluate the effects of the choice and order of examples in few-shot prompting on several versions of GPT-3.5 and GPT-4 models. Our experiments involve 119 prompts with different examples, and we calculate the quadratic weighted kappa (QWK) to measure the agreement between GPT and human rater scores. Regres-sion analysis is used to quantitatively assess biases introduced by example selec-tion. The results show that the impact of example selection on QWK varies across models, with GPT-3.5 being more influenced by examples than GPT-4. We also find evidence of majority label bias, which is a tendency to favor the majority la-bel among the examples, and recency bias, which is a tendency to favor the label of the most recent example, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Linear Layer · Discriminative Fine-Tuning · Cosine Annealing · Multi-Head Attention · Transformer